Trying to find a great Youtube video that I saw but it's a few years old now. I remember that the guy drew with a pencil on a large note pad for the explanation and it was very easy to understand. He explained how in order to copy a waveform you only have to sample at 2x the max frequency (Nyquist's Theorem). It's pure math and showed that sampling at more than 2x does nothing to improve your ability to reproduce the original waveform. That it was why it was so clever that the original CD spec chose 44.1. The human ear goes up to around 20kHz, so 44.1 allows for reproducing up to 22.05kHz. Anything beyond that is pointless and just marketing hype (IMHO). You might hear some silly arguments about harmonics and such above the hearing range but I don't know of any studies that back up those claims.
Interesting analysis on the ProTools web site:
https://www.protoolsproduction.com/44-1khz-vs-48khz-audio-which-is-better/
They mentioned improved "headroom" at 48kHz but I don't understand how that applies. Headroom typically refers to the ability to exceed nominal levels and is measured in decibels not hertz, unless they are referring to max frequency headroom. The discussion about aliasing would make sense when converting to different sampling rates. If you convert from 96 to 48 it divides evenly but if say you convert from 96 to 44.1 to create an audio CD, you'll potentially have rounding errors that can introduce aliasing. Whether the amount of aliasing is audible is another matter. They also claim that sampling at 24 bits provides greater dynamic range than at 16 bits (and that 24 bit audio is the standard for DVDs). I'd be interest to see the math on that claim.
The whole issue of sampling rates for streamed music only makes sense to me in one instance, and that is if you have a lossy internet connection. If streaming at 16/44.1 and your connection is dropping packets, that will be audible. If streamed at 24/196, dropping packets would likely not be noticeable since you have 4x the amount of information you need to reproduce the music accurately. There is so much error correction built into modern infrastructure though that lost packets are pretty rare these days accept for very poor internet connections.