The virgin speaker must remain in a factory-sealed carton, and the broken-in unit must be broken-in under the observation of a qualified audiophile referee, all certified by Price Waterhouse accountants. The accountants will also perform a Quality Assurance audit of the Cambridge factory to determine how much play time, if any, these speakers received prior to shipping.
Is there any data about how long to break-in the speakers? Is 50 hours long enough?
All your other audio gear must be certified as state-of-the-art or better.
Your test rig microphone must have recent documentation certifying it was calibrated using a NIST-traceable source.
Perform blinded listening comparisons of virgin vs. virgin (A-A), and broken-in vs. broken-in (B-B), as well as virgin vs. broken-in (A-B, the experimental question). The number of false positive answers provided by the test population will be the percent of listeners who report hearing differences between A-A or B-B. Don't expect that to be 0%.
The number of false negative answers can be estimated by determining how many listeners can and cannot hear a difference between speakers that do and do not have the tweeter wired out of phase with the woofer. How well they do with that known difference will be directly compared to how well they do in the virgin vs. broken-in comparison.
Test enough listeners, at least 100 (300 is better) to provide results with unequivocal statistical significance. N=6 ain't gonna cut it. Editors of the
The Journal of Golden Ear Trivia will insist on statistical analysis with at least a 95% confidence level.
To avoid unnecessary effort, you can adopt an early stopping rule for futility. After the first 24 listeners, if fewer than 51% can identify an audible difference between the virgin and broken-in speakers, the test may be stopped early.
Is that enough? I could go on

.