That's interesting. I don't think I've read that about the mid band before but it makes sense considering how important clear vocals are.
@erpauls a little more info might help. Do you already have amplification and if so what make and model? What geographical area are you in and are there particular brands that you can audition locally?
It really boils down to that crucial speech discrimination band from 400 Hz to 3.5 KHz, and to some extent out to 5 KHz. The ear is incredibly sensitive to aberrations in that region.
As you probably know I'm in the geriatric set, and soon will have lived through three quarters of a century of audio. So I go back to before accurate audio measuring, Thiel/Small and certainly computer modelling and measuring programs. However, back in then the BBC played a pivotal role in audio development and especially loudspeakers. You can really can not do and creditable audio production without accurate speakers. So in those days, trial and especially error was your teacher.
Anyhow the BBC found out that getting a decent smooth audio perspective across the sound filed could be enhanced by a frequency response dip centered around 3 K form 1 K to 4 to 5 K. This became known as the "BBC Smiley." It was highly effective. With improvements especially greater data, modelling and more accurate measurements it was found you did not need that dip, but the take home was that a rise in that band was highly deleterious. It produces a forward image, and causes an inability to ignore that you are listening to speakers, and destroys the illusion of perspective. So in fact keeping that region ultra flat is the best policy. However, it is still true that a dip is far preferable to the slightest peak in that region.
More bad news is that it is in just this region that crossover points almost always occur, and these can and often do introduce a whole raft of problems from frequency to time aberrations. If at all possible it is my opinion crossovers in that sensitive range should be avoided.
Even so, I'm not beyond using the old BBC Smiley in moderation, especially for two channel listening.
This is from the speakers in my 2 channel system in the family room.
The crossover points are 400 Hz and 4 KHz. Because of driver characteristics high order electrical crossover slopes are required which are fourth order LR. The BBC smiley is about 3 db referenced to 500 Hz.
However, if you look at the impulse response, you can see the deleterious effects of high order crossovers on time, causing significant inter diver delays stretching over 0.5 msec.
I'm not sure the HF rise is real. The tweeter is a mylar planar device, and I think the issue is that it affects the usual I meter measurement taken on tweeter axis, as the speakers do not sound like that curve would suggest. In any event the rise is only 3 db at 6 KHz referenced to 500 Hz. There is a rise of 6 db, between 30 and 40 Hz, I think due to position and room. I think it is real as there is the definite hint of some increased warmth. Modelled F3 is that Hz, but that quirk extends the response to 20 Hz.
I'm actually very fond of those speakers, they give a good account throughout the space. Speech and vocals are clear and natural. TV watching is not a problem and you can place yourself anywhere you want. A great place to spin vinyl by the fireplace in the winter months.
All speakers have some trade offs. In general peaks tend to be among the worst of speaker ills.