I would like to point out that the term "stereo" does not actually at its origin imply or mean 2 channels - instead it implies a 3D illusion typically produced by multiple channels (ie: more than one) - early versions were 3 channels, and dropped down to 2 due to the main medium being "records" that have 2 channels.
So Stereo by its original definition, is in fact Spatial Audio - and the term "stereo" does not define or limit the number of channels, other than it being more than 1...
Useage, as opposed to strict definitions, has however turned "stereo" into - "2 channel audio"... and that is the de facto modern usage of the term.
On other items of relevance - yes we have 2 ears... and when using headphones, with appropriate HRTF, 2 channels can do the job for all the required spatial audio queues (can, but seldom does.... between theory and practice there remains a wide gulf... but there are excellent examples of successful methods).
When using speakers in a room - we have to consider not only the two "receptors" (ie: ears) - but also their heavily tailored surround (THE Ears) - which alter the sound actually perceived, based on the direction from which it is perceived (ie: HRTF... including ear shape, lobes etc...).
So for the "perfect" illusion you need the sound to arrive at our fleshy hearing appendages, either from differing directions (as per the place/event being aurally depicted) - or to arrive directly to the ear (as per headphones) with HRTF processing, to provide the appropriate alterations to the sound to simulate that process.
Sometimes, in some rooms, with some speakers, and some recordings, the illusion can be astounding!
But listen to a well made binaural recording, and it is an order of magnitude better... we have a ways to go!
IMO multichannel (more than 2) is a step along the path to improving the sound field in a shareable "room" way.