Is level matching required to do proper A/B tests?
Yes, absolutely. Human hearing is such that a slight difference in volume is perceived as being tonally different. As the volume decreases, subjectively, the bass and treble appear to diminish faster than the midrange. Thus, two otherwise identical speakers, if playing at very slightly different volumes, will appear to produce different amounts of bass and treble. This, by the way, is what those "Loudness compensation" switches were all about on older gear, so that when listening at a soft volume, one could boost the bass and treble in a way to try to compensate for this peculiarity of human hearing.
This also explains why all gear, when doing A/B tests, needs to be level matched, or the test has no validity. Since virtually no one does this, it means that the opinions of people regarding the relative sound quality of different pieces of equipment is almost always worthless.
I have the 4 pairs of contenders in my sig. I have one pair connected to my A terminals while another is connected to the B terminals and I can switch between them easily, but I can't level match them. So far I've been using my ears and adjusting the volume. I thought I was doing good until I tried to A/B the 2030P & TSBL. Now it just seems like I could be way off this whole time.
I'm thinking I can use my receiver's test tones (pink noise) and an SPL meter to read the decibels of one pair, then match it to the other by adjusting the volume. Then, when I switch between them I adjust the volume however many clicks it takes to match up. Would this work?
It will probably work okay. However, I am not an expert on these matters (though I do know a lot more than the average person about it, that does not make me an expert at all).
My guess, for what that is worth, is that you will find that one speaker is better in some ways, and another is better in other ways. I remember auditioning speakers many years ago, and being frustrated that one kind of drum sounded more real on one speaker, and another kind of drum sounded more real on another. One I liked better overall, but I did not like it better in every way.