Time and phase
My apologies for this delay in response.
'Time' is often confused with 'phase'-- in speaker design, in recording and mixing, and in hearing versus measuring.
To me this becomes clear when first principles are compared:
A sound goes into the mic.
We look at 'it' go by on our Windows Player and see a complex wave. We can use wonderful programs to take 'it' apart, so it can show us its constituent frequencies, each coming and going with a particular loudness at every instant.
A sound goes into the ear.
We hear 'it' and know where it is, what it is, and if we've heard it before, know how it compares. All in a blink. Anything urgent in 'it'? Blink again...
"...ou want fries with that?"
===================
Two sounds go into one mic.
We see a far-more complicated waveform, which we can still take apart and analyze. Yet, if any computer could pull out and display the two waveforms, then FedEx's computer would also recognize what I said.
Two sounds go into one ear.
We hear two 'things', which can remain in memory as two separate sounds, as well as what they sounded like together.
===================
Phase
On the `scope, we can observe the tones inside any wave, seeing when they come and go and their loudness'. If we must use the word 'phase' to help describe a particular feature, we must first choose a starting point to then be 'out/in phase' with, by X-amount of Degrees.
360 degrees around the phase-o-meter represents one full cycle of 'that', perhaps the second cycle of the lowest tone arriving 360 degrees after its very first cycle happened along.
Phase, by definition, is relative. Relative to whatever you choose.
===================
Time
From that super-complex waveform, we hear two separate sounds (especially if we've heard at least one before) and they change as time passes.
While no one is certain how, we do know part of what keeps them separated is we somehow latch onto the sound of the initial transients of each new word and note. That 'sound' lets us follow the rest of 'it' along, albeit with constant reminders of 'it' at the start of each new sound from 'it', reinforced by any expectations from previous experience (= trained listener). That 'sound' is inside the 'shape' of the waveform.
Time, by definition, is relative. Relative to whatever you choose.
"This is Your Brain on Music", written by a famous recording engineer, cites much research done on exactly the above.
===================
Where does this leave us?
We know a computer cannot separate two or more complex sounds into their original parts, not because 'they are not in the summed-signal anymore, from some weird phase combination'. It can't because we have no real idea how to program it and may never, as that means understanding completely how we hear. Which is even more complicated than vision. And why FedEx's computer hung up on me.
That two-or-more complex waves remain 'inside' that one super-complex waveform is proven mathematically and experimentally (search wave+superposition) and by ear, as we can hear them. In this
photo, one wave travels down a string, then two waves from opposite directions 'pass through' each other, and finally two waves of opposite direction and of opposite polarity 'pass'. Each emerges unscathed.
Just because the strings came in while the woodwinds played does not make the woodwinds sound different, but they radically change the woodwinds' waveform on a `scope.
When complex waves are added, what we see can be described as 'changes in Phase' relative to some point, and by 'changes in the constituent tones as Time flows' relative to some starting time. Each has its uses. Specifically, for speaker design--
We don't know what's inside any complex waveform, so we'd best preserve its shape. Thus, I must make certain the lows, mids, and highs from my separate drivers arrive at your ears in the same sequence as recorded. That is the goal of time-coherent speaker design. Which automatically means phase coherent-- no need to say "time-and phase-coherent".
But any speaker can be phase coherent among any two of its drivers, without being time-coherent. This only means the peaks and valleys of the SAME SINE WAVE at the crossover frequency ONLY, coming from each driver, line up on the `scope.
It does not say each began at the same time. At best, it means one driver is 360 degrees ahead of the other, one full cycle, and thus 'back in phase'. Perhaps it is 180 degrees ahead, and thus appears upside down, so we flip the tweeter's polarity mistakenly thinking "well, that's 180 degrees I just added, so they are back in phase." No, we only flipped the polarity-- its output is still arriving too soon relative to the mid driver at that ONE crossover frequency.
In higher-order crossovers, that exact 360 or 180 timing drifts apart by a different amount at each frequency above and below the crossover, leading to even more artifacts about 'how the speaker sounds' versus how it measures.
In a proper first-order crossover speaker, the timings do not drift apart. So there is a consistency about the sound on any music and gear, at least with our speakers, which is what lays behind the comments of our Owners.
Rooms screw up time-coherent speakers, so why bother?
Of course, the room will color what is heard, and better listening rooms do not let reflections arrive too soon nor too loud, obscuring what is to be heard from the direct sound. Which will be the complex waveform arriving directly from the speakers, or none of us could point to someone apparently between our speakers. Research by
Haas and by Dr. Daniel Queen are good starting points.
Recordings have all sorts of phase issues.
The `scope shows all sorts of cancellations and reinforcements going on as two or more mics are mixed, visually. But this is not happening audibly.
I hope the above helps, but just in case:
The situation is like this. If you have a group with several mics spread around a group of musicians making a recording, the mic nearest the instrument in question will have the loudest signal from what instrument is in front of it. Unless the musicians are in an isolation chamber every mic will hear every instrument, but with varying intensity depending on location. There will be varying time delays to all the other mics depending on the distance from various instruments. Those are time delays which can be calculated for every instrument for every mic which is a function of the distance from various sources and the speed of sound.
Agreed.
Now the phase of a note from any particular instrument source will be random depending on the position of the cycle it hits the diaphragms of the various mics, that is the phase shift.
OK.
You can only make a time and phase coherent recoding with a single pair of figure of 8 or cardioid mic one right above the other at right angles.
Actually, a
spatially-coherent recording is being made, as this single-mic location/technique preserves the TIME and LOUDNESS of every instrument's constituent TONES relative (there's that word again) to every other instrument. Now, the mic, the mic-preamp and the recorder must handle the complex signal 'time-coherently', by not delaying any particular parts of the spectrum. They must not have 'phase shift' in layman's terms, to preserve clarity and transient response.
Multiple mics also preserve their own relative Times and Loudness', even after being mixed together-- to our ears, not to a `scope... even when they 'hear' the same sounds (as TLS wrote) from different distances with different loudness'. And I would add, with different 'Timbres' (Tam-burrs), or 'Textures'.
Because of these three differences between multiple mics hearing a different version of the same thing, we still follow and enjoy the sound of the violins as picked up by their main microphone. Yet, if the engineer lets another mic hear too much of them too soon, then we hear a new timbre added to them which can be irritating, but they are still heard as separate violins.
Any other arrangement will have time and phase shifts.
OK as a definition, but I would have said "... will create time delays and therefore phase shifts."
Which means a less spatially-coherent recording. Which means less clear.
Which does NOT mean the sounds from each mic are being altered by mixing them together, but are only being overlaid by each other. Which means less clear.
If a speaker is NOT time-coherent, then we hear a distorted version of that now very-complex, not-as-clear wave, still audibly containing the one hundred members of the orchestra. Side note:
A) Most speakers have become more time-incoherent over the last 40 years.
B) The reliance on 'simple' recordings, such as Holly Cole, has greatly increased over this period.
Hmmm.
Then there is dummy head recording where mics are placed in the ear canals of a dummy head. Then time and phase are as they would be for a human head. This obviously only works for headphone listening. I hope this helps.
The dummy head receives a signal that, when reproduced over headphones, is spatially coherent. And for the best 'simulation', the headphones must be time-coherent, which most are, even $50 Sonys.
Best regards,
Roy