Skip to content
Naked Security Naked Security

Listen up: is this really who you think it is talking?

Lyrebird, an AI startup, can produce uncannily good versions of real people's voices. What does it mean for identity fraud?

“This is yuge, they can make us say anything now, really anything,” says a robotic voice that sounds exactly like Donald Trump – if he had a mouth full of muffin.

That’s from a “conversation” between a fake Trump voice, a fake Barack Obama, and a fake Hillary Clinton, as they discuss a new algorithm to copy voices that’s been developed by a startup called Lyrebird.

Lyrebird last month released a public demo containing a series of audio samples of fake speech generated using its algorithm and one-minute samples of the speakers.

Here’s a sample of the fake Obama voice proclaiming that he’s not a robot (note: if these clips won’t play for you, it might be because you’re using Chrome. Try switching to a different browser; I couldn’t get it to play on Chrome, and I noticed that others reported the same problem):

And here’s robot Trump saying that his intonation is always different:

Finally, here’s the crème de la crème: a surreal conversation that gives an overview of what Lyrebird is, while showing off the artificial intelligence-generated voice technology in action. It’s between the two presidents and Clinton:

As robot Barack Obama says, the “good news” is that the development team will offer the technology to anyone.

The potential for bad news can be found on the startup’s ethics page, where Lyrebird’s developers – a team of researchers from the University of Montreal’s Institute for Learning Algorithms – admit that their artificial intelligence voice algorithm could have dangerous consequences in terms of identity fraud:

Voice recordings are currently considered as strong pieces of evidence in our societies and in particular in jurisdictions of many countries. Our technology questions the validity of such evidence as it allows to easily [sic] manipulate audio recordings. This could potentially have dangerous consequences such as misleading diplomats, fraud and more generally any other problem caused by stealing the identity of someone else.

By releasing our technology publicly and making it available to anyone, we want to ensure that there will be no such risks. We hope that everyone will soon be aware that such technology exists and that copying the voice of someone else is possible. More generally, we want to raise attention about the lack of evidence that audio recordings may represent in the near future.

All it took to train the AI to utter a sentence that none of the subjects have spoken in real life was a snippet of recorded voice. Lyrebird says that the algorithm will need as little as one minute of audio recording of a speaker to compute a unique key that defines his or her voice.

Still, the more it hears, the better it does. As Scientific American explains, Lyrebird can train its system to learn the pronunciations of characters, phonemes and words in any voice by listening to hours of spoken audio.

Voice assistants such as Siri and Alexa work by “cobbling together words and phrases from prerecorded files of one particular voice”. Rather than just stitch words together, the Lyrebird AI extrapolates to generate completely new sentences that it can flavor with varying intonations and emotions.

Scientific American reports that the key is artificial neural networks that use algorithms to help them work like a human brain, taking in data and learning patterns similar to how neurons work.

Lyrebird has competition: WaveNet, another deep learning voice speech-synthesis system from the Google-owned company DeepMind. Alexandre de Brébisson, a PhD student at the Montreal Institute for Learning Algorithms laboratory at the University of Montreal, told Scientific American that Lyrebird is much faster than WaveNet at generation time:

We can generate thousands of sentences in one second, which is crucial for real-time applications. Lyrebird also adds the possibility of copying a voice very fast and is language-agnostic.

The voices aren’t 100% convincing. There’s no sound of breathing or of lips moving, noted Timo Baumann, a speech processing researcher at Carnegie Mellon University who isn’t involved with Lyrebird. Plus there’s that garbled, muffin-in-the-mouth quality, which Scientific American referred to as “a buzzing noise and a faint but noticeable robotic sheen.”

But if you’re not paying close attention or aren’t paranoid enough, Lyrebird could pull the wool over your eyes. As Lyrebird’s developers point out, the technology is proof that voices can be faked.

de Brébisson says the Lyrebird API will be out “soon”. The first samples will be free. The startup plans to make money by charging developers and companies by the number of samples they order up.

Lyrebird suggests that the voice samples could be used for personal assistants, for reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.

By the way, that name, Lyrebird, is absurdly perfect. TechCrunch pointed to this BBC clip of David Attenborough, in the rainforest, recording the mating calls of the Australian lyrebird: an exquisite mimic of the calls of at least 20 other species. Treat your ears to the pleasure of hearing the lyrebird also commit identity theft by imitating other sounds he hears in the forest, including chainsaws, car alarms, camera shutters, and camera shutters with motor drives:



4 Comments

How can the fact that the lyrebird needs to emulate a chainsaw to “blend in” in the forest be a treat with which to pleasure your ears?

Reply

1. Did Attenborough refer to it as “blending in?” I believe the point is to show off the lyrebird’s ability to replicate a wide variety of complex sounds in order to attract a mate.
2. Referring to treating your ears to the joy of a chainsaw is irony.
3. Explaining irony makes me sad. I can hear the sound of the air escaping that popped balloon. Mimic that, Lyrebird!

Reply

Maybe someone like Trump gets varying intonation because they just input his voice as it naturally is giving speeches, but someone reading random and sometimes tongue-twisting sentences given to them to read by the site that don’t engender much enthusiasm for hours at a stretch will get Mr. Robot Voice. lol.

Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?
You’re now subscribed!