Well, for one, in both the Stereophile and the NRC measurements you'll see that the sensitivity is taken from band-limited measurements. I think they both take sensitivity as an average from 300Hz to 3kHz.
Doing so removes any bass hump from the equation (so to speak), as well as any treble hump.
But for in-room measurements, one other important reason to band-limit is that unless he places each speaker in the exact same spot while measuring/estimating sensitivity, he could get dramatically different peaks in bass response due to room modes. And that could throw off the estimate.
Note that this is in reference to level-matching. I didn't intend for this to mean also auditioning/comparing them with a high high-pass crossover selected. If they're similar in size and similar in sensitivity, they're probably similar in bass reach, so it's mostly a non-issue here.