I have been following this thread with interest with a variable degree of attention. Since it is now 60 years since I built my first speaker, I thought I would throw in my few cents worth about the issues raised here.
First of all the disclaimers.
I'm not the least bit interested in reproducing amplified instruments and vocalists. My interest is solo instruments from the lute and guitar to the largest pipe organs and everything in between. I want perfection in the reproduction of the human voice, spoken, and sung, solo and choral. This latter is the toughest assignment of all.
But I'm greedy, I want it all. I want the delicacy of the lute, focused and playing to me in the room, all the way to opera and Wagner, and the largest choral works with huge orchestra, massed choirs, soloists and huge pipe organs. I want this reproduced cleanly at concert hall levels.
I believe you can train your acoustic memory. You can develop a real and genuine reference point for natural instrument and voices. You can also develop good memory for venues you frequent.
So I started experimenting. Driver behavior back then was a particular problem. Nothing was known of Thiel/Small parameters until later.
Two speakers impressed for tonal accuracy. First was the original Quad electrostatic loudspeaker.
The next was in 1959 when this 4" aluminum full range driver appeared. The JW module.
Both these speakers sounded remarkably similar tonally. Both could generate similar spls.
The JW had significant mechanical shortcomings, which I set about rectifying.
With these units Ted Jordan defied and still does current theory and wisdom. The cone is metal, but very thin and light. It is basically foil and far from rigid. Its radiating area decreases with increasing frequency in a controlled and predicatable manor. The essence of this driver is absence of regidity and predetermined flexibility.
The driver has a remarkably flat response. In a TL it gives useful performance into the 40 Hz range and goes all the way to 20 KHz. There are zero sharp unpleasant breakup modes. The units sound quite unlike other full rangers, with a lovely smooth laid back effect creating a very believable sound stage.
Afterlife 2 has been using a pair of these drivers for a couple of years now and loves them. I made a unit available to Fuzz for his center channel and he loves it. Walter has heard it and was astonished at what he heard from that single 4" driver.
I used these drivers for almost 20 years. It took me that long to develop multi driver speakers that could best them. I still have a couple of those drivers in TLs to keep my designs honest.
So, what do I want from a set of speakers? I want tonal accuracy, which translates to a smooth response, and correct balance of all frequency bands, rendered in a believable acoustic space relevant to the program.
Special mention must be made about the bass. I want the bass non resonant. I don't want the speaker to advertize its bass. I want the speaker to be articulate enough to hear what type of sticks typms are being struck with, and I want each drum clearly articulated. I want organ pedal runs clearly articulated. No trace of bloat tolerated.
I have I think, experimented and built speakers with just about every type of bass loading. Here I have come to a definite conclusion. Transmission line and horn loading fit the ideal closer than anything else. Unfortunately these designs are the most difficult. For me horns are not practical in the home, due to the size required for extended bass. For large auditorium designs bass horns are my preferred option. For domestic designs and studios the TL.
So how to I set about a design.
Like organ builders I determine the compass of the speaker. That is to say the frequency range to be covered. The power output required/desired. Both of these related closely to budget.
Physical size will influence the choice of loading as will driver parameters.
The next step is to look for drivers that will work well together with the simplest crossover possible. I'm really not doctrinaire about cone material. I have worked with many, paper, polypropylene, Bextene and other plastics including polystyrene, and of course metal. I have never used Kevlar, that is because a driver using Kevlar has never been selected. I would not rule out using a Kevlar speaker though. I'm not keen on undoped paper cones. I have a slight preference for metal cone drivers, but soft dome tweeters. I don't especially favor ribbons, because I think there are so many good dome tweeters out there, that they are not worth the trouble and expense.
As to the number of drivers, my opinion on that is no more than necessary. I always try and use the minimum number of drivers commensurate with covering the compass of the speaker and the power demands. The ideal speaker would actually be a high powered single full range driver.
My full range driver era profoundly influences my designs and always will. The more drivers you add the more the drivers interfere, and you create troublesome lobing issues. Far more often than not a speaker with a large driver count is a lousy speaker.
A word about line arrays. My experience at Kneller Hall and a large auditorium installation in Canada using huge line arrays of JW modules crossed over active to bass horns at 400 HZ, was strongly positive. It was far superior to any compression loaded horn systems.
In the domestic environment I have not been successful in getting the presentation I desired. My impression is that you need to really get way back from line arrays.
In any multi driver system, the crossover is the very heart of the system, and is mainly responsible for the way a speaker presents itself. With careful design you can use speakers with different cone material. My mains use metal cone drivers, the center polypropylene. Yet there is not change in tonality as singers move across the stage. The front stage is totally seamless.
Now I have a hard time devoting a power amp to drive a tweeter in a domestic system. In addition to expense, tweeters are easily damaged by amp problems, which makes a series cap mandatory anyway. For most designs, with the crossover above 2 KHz insertion loss is acceptable. However you must choose drivers that don't have big dips in response. A passive network can not apply boost, you can only cut. So if you have a response dip, you have to cut either side of it. So if you must work with this situation an active network is best.
For reference systems I believe low crossover points should be active. Using active baffle step compensation confers huge advantages in optimization.
Active solutions also allow the blending and control of multiple signal bands to one driver.
Now modelling gets you so far. Then you get down to measurements and listening.
The DIY constructor has a huge advantage here. You have unlimited time for the always required revisions. I follow the plan of the late John Wright of TDL here. I correct serious errors promptly. Then I listen for extended periods before making other changes. Then I look hard for the trouble and correct it. John Wright made these corrections at 3 to 4 month intervals, and that is what I do. Never dismiss what bugs you. Your senses about this will be pretty much infallible. In the end you get to the point where the system is doing everything you want.
This speaker system was first set up in 2006. I made my last revision about two years ago. I have felt no further need for revisions.