Goliath,
I think one of the points you might be missing is that as Swerd said "Just because a listening test is done under blinded conditions doesn't automatically make it a valid test. It has to establish just what the listeners can detect with the listening conditions available during the test."
What I think is underlying this quote is that we can't guarantee the necessary conditions that are required for a blind test unless we are doing it following the recommended guidelines established in BS.1116 or other ITU guidelines for blind testing.
Most blind testing that we see on forums falls far, far short of these guidelines. So the typical home test conditions are so varied & the listeners & playback system so unqualified (unqualified in the sense that we don't know what level of difference is actually discernible) that we need some self-reporting measures included in the test to verify how valid it is for measuring the sort of differences being examined.
Let's take ABX testing as the typical blind tests we see on audio forums. The definition of this test has been quoted to me on Hydrogen audio as ""ABX is a minimalistic method to test for false positives" - in other words it minimises hearing a difference when it's not there. (Yes, I'm involved in an ABX thread on HydrogenAudio much along the same lines as here but more vicious & personal). This sounds like a very reasonable approach but let's look at the implications. What if the test is so organised that only really gross differences will be noticed i.e the listeners or test conditions don't allow for recognition of subtle differences - maybe the playback system has a level of noise that masks such subtleties, maybe the background noise does, whatever? Does the ABX test results coming from such a test give any indication that such a condition pertains i.e that subtle differences aren't possible to be heard with this particular test run.
This is not condemning all ABX test as many professionally run tests follow the recommended guidelines contained in the ITU documents but it is suggesting that we need to pay attention to ""Just because a listening test is done under blinded conditions doesn't automatically make it a valid test."
My feeling with the results from these home run tests is that we have a very wide range regarding the sensitivity of the test - some might be sensitive enough to reveal subtle differences but I suspect most are only revealing of gross differences. Obviously this will result in a much higher number of "dubious" null results lumped in with "valid" null results.
Now, I know the argument that null results don't prove anything but the fact of the matter is that the more null results that are accumulated, for say blind testing amplifiers, the more the people who are "on the fence" believe that there is no audible difference. This sets up an negative expectation bias which only internal controls will have any hope of picking up.
Jakob1983, on that HA thread said something which I've thought is the best way to do such tests - invisibly i.e the participants don't know they are doing a test. But this is extremely hard to organise. However, it would cater for any negative expectation bias - there can be no expectation if the listener doesn't know what's being tested.
BTW, from examples I've seen, the power of this bias is stronger than sighted bias. There is a series of 4 get-togethers documented on Pinfishmedia forum of 4 listening tests run over the course of a year or so. The main organiser of the get-togethers was a guy called Vital & the threads will be found under the thread title "DBO I" or II or II or IV. Anyway, over the course of the get-togethers they were listening sighted & blind to different DACs. In total maybe 30-40 people were involved between all tests & maybe 10-15 DACs. Not until the final get-together did they hear a difference between DACs & that seemed to be because someone at the event was able to point out what to listen for. Vital, who would be an open-minded objectivist said that the blind tests null results were part of the reason why he couldn't hear any differences. I just looked into that forum to check the name of the threads & saw an ABX thread on which he restated his opinion again today (I can't post a link)
"I think ABX shows us that the differences heard are much less than reported in sighted tests, but that the process can in itself hide the fact that differences do exist."
I think everyone should learn quite a lot from this.
One of the important things to realise from these meetings & you can read it in their post meeting write-ups, something was preventing them from hearing differences during sighted listening. Was it this negative expectation bias or was it not knowing what to listen for? According to Vital, it was the blind testing which was to blame, to a large extent. I suspect they were demotivated in trying to hear differences,
Now, I don't expect home run blind tests will read or follow ITU recommended guidelines so I favour again what Swerd said "That's why there must be positive and negative controls in these tests." These internal controls, I believe, could allow us to get a handle on the validity of such tests. Without these controls we are seeing results & have no way of evaluating how valid they are for revealing audible differences & to what level.
It seems to me the only logical way to deal with the huge variability of the conditions under which these tests are run. If the conditions under which tests are run can't be controlled then some way of verifying that the test can do what it is supposed to do, is needed i.e a way of verifying that this run of the test is actually capable of revealing an acceptable level of differences.