12 Apr 2019

Sight and Sound – Keep It Out Of Sync


Sight and Sound – Keep It Out Of Sync

by Simon Byrne.

That’s right, out of sync. At least from an audience perspective.


There has been a lot of research done by the hearing aid industry to improve intelligibility for hearing aids. Some of that research can be applied to live events

The really interesting thing that came out of the research, is that intelligibility actually increased when a small delay is inserted in the audio chain, when compared to the accompanying vision.


That is, the subject understands better when the audio arrives slightly after they see the lips move on the speaker. And the amount of delay is different for everyone. Conversely, comprehension dramatically decreases when the audio arrives slightly before the vision.

The research from University of London also showed potential benefits of adjusting the time delay between audio and video signals for each individual, finding that speech comprehension improved in half of participants by twenty words in every one hundred, on average. The amount of delay required is different for everyone!

When you think about it, that makes sense. In the natural world, humans are used to the sound arriving after the vision because light travels about 870,000 times faster. That is, light arrives instantly for our purposes, and sound takes about one millisecond per foot (or three milliseconds per metre). We are used to hearing things after we see them, and that delayed audio depends on how far away we are from the source.

But here is another interesting fact. The brain apparently processes the audio quicker than the vision! That means some of the audio delay is accounted for, by the added vision processing delay in the brain.

Probably as result so many variables, the boffins have also proved that it is very difficult for humans to match lip sync themselves (video editors will tell us this), and those matches are different for all of us.

It is no wonder that filmmakers use a clapper board at the start of a shooting a scene. When the gate on the clapper closes, it delivers a definite visual cue as well as a perfectly timed audio clap which makes it possible for the editors to quickly sync up the audio peak waveform on the timeline, to the closed gate in the edit.

We place a lot of importance on latency in audio systems to an absolute minimum for obvious reasons, and the accompanying video systems need to be in sync. But generally speaking, there is more latency in the vision systems.

So in a real world, what does this mean? Maybe a rethink is required when using video displays, so as to ensure that the audio is delayed sufficiently after the video.

Instead of thinking of audio delay in milliseconds, we might think of it in feet. Roughly one millisecond equals one foot. At front of house, the audio and lighting folks are experiencing about one hundred feet of delay, or a tenth of a second. If the video display only has fifty milliseconds of delay, happy days. But for those in the audience less than fifty feet away, according to this research, will perceive less intelligibility! This is due to the audio arriving before the matching vision.

It seems that audio lagging vision by as much as fifty milliseconds is acceptable, probably even preferred. However, based on this research, we never want audio to precede the vision, even slightly.


From CX Magazine – April 2019
CX Magazine is Australia and New Zealand’s only publication dedicated to entertainment technology news and issues – available in print and online. Read all editions for free or search our archive
© CX Media




Published monthly since 1991, our famous AV industry magazine is free for download or pay for print. Subscribers also receive CX News, our free weekly email with the latest industry news and jobs.