While I've already expressed my displeasure with Audyssey, along with many other room corrections, I don't think this is a fair statement. The Harman work did, in fact, show that the corrected response was preferred. It's still applying the correction to the response, which was preferred, it just happens to involve a lot more human intervention than the automatic room correction systems do. EQing the LF's to remove peaks in the bass caused by boundary interference effects and room modes is absolutely necessary in my opinion. Automatic correction schemes can do this and do so well, improving the perceived sound.
The problem is how they operate, how the algorithm makes judgment calls. One of the bigger problems that I find is that they apply a flat room curve, which is not typically desirable. It often makes the system sound thin. They also often apply correction in the mid's and highs in ways they shouldn't. If the speaker has a really good power response, then the EQ has an easier job, the algo doesn't have to make any smart judgment calls. Most speakers do not have such an ideal power response and as such the algo has to make judgment calls with far too little information. As such it often makes mistakes, correcting things in ways that are audibly inferior (and which the end user is likely to have no idea about).
I'm still working on an article around this topic for Audioholics. I've delayed the article while I spend some time revisiting a lot of the classic literature. I already posted much of the theory and research into this concept in my Mic article thread. The key remains that the mic doesn't pick up what our ears can hear. No matter how reliable the mic is, if the software is having to make corrections based on faulty (but consistent) information, then the corrections will be wrong. The in-room response captured by an omnidirectional microphone is a lot different from what we actually hear tonally. You might think you can simply account for this, but actually, it's dependent on the speaker's dispersion. It's further complicated by the fact that this interacts with the room itself. How much of the reflections are being absorbed or dissipated impacts the response as captured by the mic, which still differs from the tonal balance that we hear. At low frequencies, no matter how much you absorb, some bass will still reflect and the dimensions of any normal listening room is physically small relative to the length of a period at 100hz or below (especially below) meaning we still hear those reflections as part of the direct sound (It isn't really direct, but it's part of what we hear as the initial sound),what is captured in the steady-state for the most part.
But that doesn't mean we can't apply correction using an algo that is better than no correction, even with a largely perfect speaker. The key is that the software either needs to know the dispersion of the speaker, or the end user needs to know. We need more research characterizing how different dispersion patterns impact the in-room response and the relationship between that and listener preference curves. I have some ideas around this, but no definitive research to cite. It's something we need to probably investigate further. In any case, based on this information, the shape of the preference curve can be obtained. In addition, we want ideal speakers, as I don't think it is possible or desirable to correct the speakers in this way, especially with regard to the dispersion pattern anomalies.
Dirac, one of the corrections I find get's it right more often than not, especially with lesser systems, uses a pretty sophisticated algo to assess the room. One of the reasons for the multiple measurement points in that system is not just about improving the response over a wider listening area, it is actually used to characterize the room/speakers. By taking multiple measurements in different points in space you can begin to understand what is going on with the speakers vs the room. Is it diffraction? Is it related to an uneven polar response? Is it a room mode? SBIR? The software can figure that out, to some extent, and apply a correction that minimizes problems without introducing new ones. A common misunderstanding of that software (and many other sophisticated correction systems) is that it is averaging those measurements and applying an inverse of the transfer function. That is incorrect, what it actually does is create an inverse transfer function (with some limits applied for sanity sake) that is based on a best-fit line that minimizes the variance at any one measurement point.