So part of the original audio from the 10lb master is omitted to make it fit in 8lb CD bag and I'm told that this is not a form of compression. Okay what is the correct terminology?
- Converting from one sample rate to another is called
resampling. If you start with 96 kHz, you have 96,000 samples per second and want to convert to 44.1 kHz, which is 44, 100 samples per second. Conceptually what is done is to 're-group' the samples so that there are now 44, 100 samples for each second.
It is not a simple matter of just taking the first 44, 100 and calling them second one, and the next 44, 100 and calling them second two, however, because then you'd be left with 7800 samples at the end and what would you do with them? Nor does the algorithm simply grap the first 44, 100 samples and discard samples 44, 101 through 96,000.
Instead, for each original set of 96, 000 samples an interpolation process is used - think of it as 'curve fitting'. You've got a set of points from 1 to 96, 000 and you want to end up with the same (or very similar) shape curve using only 44, 100 points.
Downsampling to a lower sample rate causes 'aliasing' which is when a higher frequency captured by the higher initial sampling rate wraps around to a lower frequency. With 44.1 kHz, the highest frequency captured would be half that - 22.05 kHz. If the original contained 50 kHz, it would wrap around and become 5.9 kHz (50 - 44 .1). 'Anti-aliasing' filters are used to eliminate that artifact.
- Converting from one bit depth to another is called
bit depth conversion.
Going from 24 bit to 16 bit is not a simple matter of chopping off ('truncating') the low order 8 bits. Dither (random noise) is added using a process known as 'noise shaping' and aims to put most of that noise at a very low level (typically -90 dB down) and then zeroes out the low order 8 bits. Now when the low order 8 bits are truncated you are left with a 16 bit sample and when all of them are likewise you processed you end up with a waveform that is very much like the original.
Is the full dynamic range from the master there?
If the master was recorded digitally at a higher bit depth and sampling frequency, then no. Higher bit depths capture a greater dynamic range and higher sampling frequencies capture higher frequencies in the original audio (Nyquist/Shannon sampling theorem again).
The higher frequencies above our range of hearing don't matter and it's ok to get rid of them without altering our perception of the sound. The extra dynamic range possible from a higher bit depth is not much of a problem either - how much music truly has a dynamic range of 144 dB? Even if it did, could you hear that large a difference between the loudest and softest sound? Think about this: even with good headphones while listening to music and watching the meters in SoundForge, when the level drops to about -60 dB as the music fades out, I can't hear a thing.
Lossy formats like MP3 omit part of the music from the CD to make it fit a 3lb MP3 bag and that is compression. Okay fine. In both cases a decision is made to omit those bits of the original audio that you are least likely to miss but both omit some part of the original to fit a size restriction and one is compressed and the other is not, correct?
Right and I know it sounds like a semantic hair split but it really isn't. PCM is the lowest common denominator in digital audio and everything is built on top of it. MP3s are PCM too - BUT they are not raw sample values that can be directly read one by one by a DAC and converted to analog.
Just as you can't 'type' or 'cat' a .zip file on your hard disk - it has to be unzipped (decoded) to get back to the raw underlying contents. The MP3 decoder decodes the encoded samples and produces a stream of raw PCM samples. Note that it is the same for Dolby Digital, DTS, etc (except the encoding is obviously different than MP3 which is different than AAC, etc).
An MP3 encoder effectively 'resamples' on the fly. It reads the raw PCM samples, analayzes them, and groups them into bands, either by frequency or by time. Each of those bands are analzyed and anything the model deems inaudible within each band is discarded - for example, a soft sound that is immediately preceded or followed by a louder sound is discarded because the loud sound would 'mask' the soft sound and you wouldn't likely hear the soft sound anyway.
In a nutshell
All of this is very complicated and mathematically intensive. The benefit of higher sampling rates and higher bit depths in recording is not so much that we need to capture very high frequencies (that we can't hear) or need to preserve a huge dynamic range because most music simply doesn't have a dynamic range much greater than 96 dB, which is the limit for CD.
It is beneficial for the post-processing stage to reduce math errors like rounding or overflow. If you muliply two 16 bit numbers (say when doing 'normalization') the result can be greater than what would fit in 16 bits. The higher bit depth and sampling rate gives us headroom to work with to prevent those kinds of errors.
If you open a 16 bit CD track in Sound Forge, it will convert it to 32-bit floating point so you can manipulate it. When you are done and save the file, it converts it back to 16 bit.