Sound and light are both waves.
A green pixel is fundamentally the same as a 1000 Hz sine wave.
But video isn't light creation.
Video is just color. Full spectrum light is provided by a bulb; or a "white" LCD/LED, or the sun, or not at all.
You are welcome to tell me why I can't have a digital light bulb... but that's not the video portion.
I can describe (which is to say "record") a color in a useful way in zero time.
I'll do so now: FFFFFF.
Can you do that for audio? Is a single "pixel" of a CD of any use at all? No. It's only useful relative to others over the time domain.
Even if I accepted that your premise interacted with my point: the transition from green to black in one frame could be described as "on for 1/60-sec, then off" of a green lightsource.
Of course: we digitize that value in shades: so it's "255 for 1/60-second, then 0"
Add the two other colors "255,0,0 for 1/60-second then 0,0,0"
Repeat for every pixel (I wont do here).
There's digital right there. (at 1 nanosecond I will be 255,0,0)
I'm a loudspeaker. Give me hypothetical digital audio data for exactly 1 unit of time. Tell me where I will want to be for ever Nth value of time into which that 1 unit can be subdivided.
You can't. We don't deal with audio in discrete pieces like we do with video.