Amplitude Sensitivity of Human Hearing

When statistically analyzing listening test data based on a small number of trials or test subjects (such as in the case of the Koya study), using the conventional significance level (α = 0.05), there are two categories of error that must be considered: Type 1 - concluding that inaudible differences are audible; and Type 2 - concluding that audible differences are inaudible. In studies such as this one, the α = 0.05 significance level usually produces type 2 error > type 1 error. Equalizing both errors usually requires reduction of type 2 error, since it is desirable to keep both errors as small as possible.

There are three ways of reducing type 2 error in a listening test:

1. Increase N. This is the preferred (but not always available) method to decrease type 2 error.
2. Increase p. Careful test design can increase this
3. Increase type 1 error. A method of last resort. May be necessary, to avoid type 2 errors that have p values which are just slightly above chance.

Incorporating equal probabilities of both types of errors for a particular p value of interest r, requires a metric by which the degree of equalization between the two error types can be assessed. In this case, the fairness
coefficient, FCp, provides for a useful figure of merit, where:



Here, an FCp, = 1 represents an ideal, perfectly fair study.

From: Leventhal, L.: "Type 1 and Type 2 Errors in Statistical Analysis of Listening Tests," Journal of the Audio Engineering Society, Vol. 34, pp. 437-453 (1986 Jun.) we have the following table:


<pre>N r Type 1 Error (a) actual value Type 2 Error (B)
p = 0.6 p = 0.7 p = 0.75 p = 0.8
15 14 0.0005 0.9948 0.9647 0.9198 0.8329
13 0.0037 0.9729 0.8732 0.7639 0.6020
12 0.0176 0.9095 0.7031 0.5387 0.3518
11 0.0592 0.7827 0.4845 0.3135 0.1642
10 0.1509 0.5968 0.2784 0.1484 0.0611
9 0.3036 0.3902 0.1311 0.0566 0.0181
8 0.5000 0.2131 0.0500 0.0173 0.0042
7 0.6964 0.0950 0.0152 0.0042 0.0008
6 0.8491 0.0338 0.0037 0.0008 0.0001</pre>


Selecting r = 9 for p = 0.6 and using the above formula for FCp:



Keeping mind that an FCp value of 1 is the ideal, 0.7781 indicates a high degree of fairness.

As to why the researchers chose p = 0.6 value, we find in the text:

"Although a p of 0.6 may seem as a low criterion, it was chosen so that the subtle effects of the audibility of phase distortion were uncovered in the analysis. Therefore, for this study, anything above 9 correct responses (r) out of 15 will be considered statistically significant for p = 0.6".

References
Koya, Daisuke: "Aural Phase Distortion Detection", Masters dissertation, Master of Science in Music Engineering Technology, University of Miami, Coral Gables, Fla., May 2000.

Leventhal, L.: "Type 1 and Type 2 Errors in Statistical Analysis of Listening Tests," Journal of the Audio Engineering Society, Vol. 34, pp. 437-453, June 1986.

- Posted for Mark Sanfilipo (inserted chart and formulas)
 
mtrycrafts

mtrycrafts

Seriously, I have no life.
Yes, but that fairness coefficient can be much closer to 1 with a bit higher p value. Say 10 responses and p=.75. The fairness = .983, not that .7781
 
Last edited:
pikers

pikers

Audioholic
mtrycrafts said:
Since this thread is ongoing, I re-read the original articles and the Daisuke Koya article discussed. This citation has been discussed elsewhere on the net in the past.
You may want to ask Mark S, the author to check the statistics used in the Koya paper, specifically the author's acceptance of only a 69% confidence level, unheard of in science. The minimum is 95% confidence level.
I would highly doubt his audibility conclusions based on such a piss poor confidence level.
Which raises the question...

If using human hearing to quantify differences in, say, cabling, does the frequency of repeatability really have anything to do with an actual difference? Or, is it when you reach a certain level you begin to discount the results as random chance, like flipping a coin?

I would attribute the inability to reliably conquer alll DBT testing to listener distraction, not a quantifiable difference. :cool:
 
HDOM

HDOM

Audioholic Intern
maybe someone should make a new format with maximun 70 db i guess it would be great
|
Duration per day, hours | Sound level dBA slow response
______________________|_________________________________
|
8...........................| 90
6...........................| 92
4...........................| 95
3...........................| 97
2...........................| 100
1 1/2 .....................| 102
1...........................| 105
1/2 .......................| 110
1/4 or less.............| 115
 
Steve81

Steve81

Audioholics Five-0
maybe someone should make a new format with maximun 70 db i guess it would be great
To what end? You need adequate dynamic range to capture high level, short term peaks as well as low level noises. Keeping SPL at a safe level is up to the person in charge of the volume knob.
 
newsletter

  • RBHsound.com
  • BlueJeansCable.com
  • SVS Sound Subwoofers
  • Experience the Martin Logan Montis
Top