There are actually a lot of things that can come into play here.
First up, there's driver integration or the "summing" of the sound coming from the various drivers. Many bookshelf-sized speakers are a simple two-way design, ie. one tweeter and one woofer in close proximity to one another with a very simple cross-over. Such speakers can essentially act as a "point source", even when you are sitting quite close to them. In essence, the sound seems to come from a single point in space, rather than from two separate drivers.
With a larger tower speaker, you will often have more drivers - perhaps something like a tweeter, a mid-range driver plus two or more bass woofers. The cross-over is much more complicated with a 2.5-way, 3-way or 4-way design. In order for the sound from all of those drivers to "sum" properly and seem as though they are a point source rather than several separate drivers, you have to sit considerably further away.
So that brings up another factor - seating distance. The closer you are to a speaker, the more you hear the direct sound coming from the speaker vs. the reflected sound off of walls, ceiling and floor, and other objects in the room. With a tower speaker, you are forced to sit further away so that the drivers can "sum". But, as a result, you also end up hearing proportionally more reflected sound. This makes the sound seem more "spacious" as a whole, but also less pin-point precise.
There are also human hearing factors involved. Our hearing is more directional at higher frequencies. Bookshelf speakers tend to produce a lot less low frequency output than tower speakers. If you are hearing proportionally more high frequencies, you will tend to find that sound more directional. This is actually one of the reasons why I strongly favor separating the bass frequency production - allowing subwoofers to handle the low bass while using smaller speakers to handle the mid-range and treble. The subwoofers can be optimally placed for even, directionless bass sound while the speakers can be separately placed for optimal directionality.
There are psycho-acoustics involved. When we SEE a bigger speaker, we HEAR a "bigger" sound - even if it doesn't actually exist! It is amazing what surprising results true double-blind listening tests can reveal. Without having our vision involved, we report hearing sounds that are much closer to what anechoic measurements verify as what was actually produced by the speakers. But when we SEE the speakers, we report hearing all kinds of things that cannot be objectively measured! My favorite tests are the ones where new speakers are brought into the room and people remark on the huge difference in sound that they heard. Then, it is later revealed that neither of the speakers they SAW were ever actually playing! Instead, in both cases, the same, unseen speakers were actually playing! The sound was always identical, but the visual difference created a psycho-acoustic difference that people SWORE they heard.
Back in REAL difference land, the room once again comes into play via structure-borne transmission. Very few people (sadly) decouple their speakers. Most people have spikes or cones or some other type of "feet" on the bottom of their tower speakers. Those "feet" couple tower speakers to the floor. And every little movement of the towers' cabinets are then physically transmitted into the floor. The floor (yes, even a concrete floor) shakes in sympathy with the speakers. The floor shakes the walls, the walls shake the ceiling. And the result is sound that actually emanates from the room surfaces themselves!
Bookshelf speakers should also be decoupled, but due to their smaller size and fewer drivers, they typically create less powerful vibrations in the first place, plus they have a table or stand inbetween them and the floor. That table or stand will shake in sympathy with the bookshelf speakers, which is why bookshelf speakers - just like ALL speakers - should be decoupled. But less physical shaking is transmitted into the room surfaces themselves as the bookshelf speakers are somewhat more separated and not directly in contact with the floor the way most people's towers are.
The cabinets of the speakers also produce sound. And with substantially larger cabinets, towers have more "cabinet coloration" of the sound than bookshelf speakers.
The bottom line of all of this is that, with a tower speaker, there are more sources and more opportunites for you to hear sound that is coming from somewhere OTHER than the direct sound of the speaker itself. The cabinet is making more noise; more noise is being transmitted into and by the structure of the room itself; the speaker is producing more bass, which in turn, interacts more forcefully with the room; the cross-over network is more complicated, which can alter the sound that is actually coming from the drivers; tower speakers call for more amplifier power, which can reveal deficiencies in your power amps that bookshelf speakers won't; you are likely sitting farther away from tower speakers, which means proportionally more reflected sound; the greater number of drivers means that a tower might not properly "sum" depending on how far away you are sitting. And then there's actually the most likely culprit, which is that your EYES are tricking your EARS into "hearing" a difference that doesn't actually exist!
So there can be a lot going on, but it isn't a hard and steadfast rule. It's entirely possible for a tower speaker to image just as well as a bookshelf, just as a bookshelf/subwoofer combo can sound just as spacious and open and "deep" as a tower. The trick is to acoustically treat the room, eliminate structure-borne sound transmission by decoupling ALL speakers and subwoofers, and use proper placement so that the drivers of any given speaker have enough distance to "sum" properly into an apparent point source. Have ample amplifier power available so that the speakers are not "choked" by inadequate power or worse, amplifier clipping/distortion. And finally, don't let your eyes deceive you!
Hope that helps!