I feel like you are talking about a Control System 401 solution when you are at a Control System 001 level of understanding.
You have an idea, but I'm not sure what your actual programming skills are. You seem to be looking for a plug-n-play solution to a problem that doesn't have any solution in existence to my knowledge. It doesn't mean a solution isn't out there that exists, I'm just not aware of one.
What this means is that you need parts and pieces and putting them all together into a unified solution. You want a tone to play or lights to flash at levels appropriate to the ambient environment? Then you need to read that ambient environment, convert those readings to usable values, read the doorbell press, convert that to an action, and then have actions occur based upon the available information in a meaningful and reliable way.
Knowing the volume of the Denon is almost meaningless.
Why? Because the volume of the Denon is not a measurement of the level that the speakers actually will play at from that single product. The Denon is a pre-amp and a amplifier, that is then connected to speakers. So, what you need to know is what level the speakers actually play the tone at at various volume levels of the Denon, then adjust the playback of those speakers according to the current sound level that is measured in real time in the room, or averaged over the last few seconds.
If the Denon is the only source of audio in the room, and the measurements are taken from the room while the Denon is making those sounds, then the input on the Denon would need to change, cancelling the sounds in the room, and making the need for a 'loud' chime meaningless anyway.
This system would actually need a completely different set of speakers.
It would need something that could create the chime.
It would need something to hear the current audio levels in the room.
It would need an effective way to convert the audio levels to usable responses.
It would need a trigger device (doorbells).
It would need a way to convert those triggers to actionable items.
It would need a lot of code to make all that happen, which is a lot more than an 'if-then' statement, unless you get equipment designed from the ground up to do all of those things.
That's pretty much where a piece like a Biamp can come into play as a 'ducker' is the audio equivalent of a if-then. If the audio is at a certain level, then trigger the next volume level. You stack them and they automatically set the volume level for an output to feed an amp at appropriate levels.
You still need an external channel of amplification per speaker zone.
You also need to wire for those speakers.
You also need the speakers themselves.
Oh, and a mic in every room along with the wiring to those mics.
But, the Biamp I listed above is one of the only 'canned' solutions to what you are talking about doing.
Which, hopefully you now realize, isn't a simple plug-n-play solution.
Denon is nice. They open their Serial (RS-232) protocol so that you can get feedback and good information about the current setup of their receivers, as they have decided. If you are now Googling 'serial protocol' or 'RS-232', then you are at the very front end of learning how control systems work. If you are looking for a nice GUI that is just handed to you, on a solution that may not exist at this point and don't want to code everything yourself, then it just isn't going to happen.
Having a doorbell that works on your phone is nifty, but it's not open source. It's the exact opposite of open source. Unless they have released the API to the general public, and it is usable, and provides the information that you need, then it's completely closed and useless.
So, I'm not sure how far down this road you have gone. I'm not sure what type of doorbell you have that is tied into you server. I'm not sure whether you've played with the API directly, or what you are doing with that information. It could be a lot, or it could be nothing. But, the rooms you want to use it in are the parts that need a lot of sensors tied into things and there are DiY home automation systems with IP controlled devices and sensors that sit on the end of them that may allow you to move forward with this idea. But, it will not be plug-n-play, it will be testing, coding, and you will still need some consideration of speakers, mics, and amps to make it all work.
It doesn't have to be expensive, but it depends on how much time you are willing to put into all of it.
Saying "You just read the ambient noise level." - is easy to say and do.
Saying "You need to convert that ambient noise level from one device and make it an actionable item on another, completely separate device." - is the crux of control system programming.
In my world: When a 3-way divisible room is opened up, I need to combine all the speaker in the space into a single zone and have all the touchpanels mirror one another. When someone plugs in a computer into one of the (now) available four connection points, I will turn on all the displays in the 3 combined spaces and play that computer on those displays, if the system is in automatic mode. Otherwise, they can choose which displays or room audio to send that computer to. Then they can mix and match audio/video sources as they desire. If all the computers are currently disconnected, and they are not in an active phone call, the system should turn itself off automatically after five minutes.
That paragraph was about 120 hours of work and testing to get right.