iStock

Google AI fixes video call sound quality with computer-generated syllables

AI helps to fill in the gaps caused by a poor internet connection

Like GearBrain on Facebook

Google has created a new piece of artificial intelligence that can improve the audio quality of video calls with computer-generated syllables.

Called WaveNetEQ, the technology works by filling in the gaps caused by a flakey internet connection, when portions of words are lost due to packets of data not making it from one person to the other.

Read More:

Although the AI has been in development for some time, its arrival now is welcome news for the millions of people keeping in touch with friends, family and work colleagues with video calls while in isolation due to the coronavirus pandemic.

The technology is limited to Google Duo for now, the company's video chatting phone app. To maintain privacy and Duo's end-to-end encryption, WaveNetEQ is powered by the smartphones of each participant on the video call, so no data from your calls is shared with Google itself.

The AI has been trained to slot in syllable sounds and can fill gaps in audio of up to 120 milliseconds; if more than that is lost from a spoken word, the audio fades to silence until the call audio returns. This means the technology isn't capable of replacing entire words, but can help make speech sound more natural, restoring the fractions of seconds lost to a poor internet connection.

Video calling The AI helps to fill in the gaps of patchy audio Getty Images

But it isn't just intended for those struggling to find usable 4G or Wi-Fi. Google admits that 99 percent of Duo video calls suffer from at least some audio-related issues. Of these, 20 percent lose more than three percent of their audio, meaning there are many occasions when the WaveNetEQ technology could seamlessly come to the rescue.

Google trained the system with a dataset containing audio from over 100 speakers in 48 different languages. This, Google says, "allows the model to learn the characteristics of human speech in general, instead of the properties of a specific language."

The company added: "To ensure WaveNetEQ is able to deal with noisy environments, such as answering your phone in the train station or in the cafeteria, we augment the data by mixing it with a wide variety of background noises."

Owners of the Google Pixel 4 smartphone are the first to get the new technology, as it was quietly added to the handsets via a software update in December. It is now being rolled out to more devices, although Google hasn't said which for now.

Like GearBrain on Facebook
Show Comments ()

THE GEARBRAIN