The recordings' ambient field is on the recording, whether natural or synthetic. The room is Not the ambient field of the recording.
I understand you use different drivers and are happy with the sound. I don't care if they offer 'exactly' the same sense of tonality. The same sense of tone can be a different waveform. Images require consistency in waveform. There is no coherence with different waveforms between channel pairs. Any pair in a multichannel recording is a channel pair. Similarness can still be pleasant, and sometimes very pleasant. Sameness is what is required to get the job done.
A good pro soundstage uses all the same floor channels.
The THX standards were designed so the average consumer could physically and financially handle multi-channel HT systems - small speakers all around, and 80Hz crossovers to Subs. In a small room, a consumer could use something like Bose cubes all around, add a small Sub, and still get great 5.1, 7.1 HT. A system like that, with exactly the same speakers, will produce better multi-channel imaging than any other system, no matter the cost, where the speakers are just 'similar' in tonality.