Skip to content
Naked Security Naked Security

Alexa, Siri and Google can be tricked by commands you can’t hear

Researchers have shown how attackers could trick voice assistants.

As tens of millions of happy delighted owners know, Siri, Alexa, Cortana and Google, will do lots of useful things in response to voice commands.
But what if an attacker could find a way to tell them to do something their owners would rather they didn’t?
Researchers have been probing this possibility for a few years and now, according to a New York Times article, researchers at the University of California, Berkeley have shown how it could happen.
Their discovery is that it is possible to hide commands inside audio such as voice statements or music streams in a way that is inaudible to humans.
A human being would hear something innocuous which the virtual assistants interpret as specific commands.
The researchers have previously demonstrated how this principle could be used to fool the Mozilla DeepSpeech speech-to-text engine.
The New York Times claims that researchers at UC Berkeley were able to:

…embed commands directly into recordings of music or spoken text. So while a human listener hears someone talking or an orchestra playing, Amazon’s Echo speaker might hear an instruction to add something to your shopping list.

How might attackers exploit this?
The obvious examples are manipulated audio buried inside a radio or TV broadcast, podcast, YouTube video or online game, or perhaps even autoplaying audio on a phishing website.
As for which commands, the answer is more or less anything the device can be asked to do from dialling a phone number, accessing a website, or perhaps even buying something.
For example, the researchers claim they were able to hide the phrase “okay google, browse to evil.com” inside the sentence “without the dataset the article is useless.”


A vulnerable device would be any that responds to voice commands, which today would be home speakers and smartphones.
The problem the research highlights is how little is known about how internet companies implement speech technologies and what, if any, safeguards are built in.
On the face of it, smartphones would be harder to manipulate because in most cases they require users to unlock them before their embedded digital assistants will activate.  Always-on home speakers, by contrast, might be easier to target.
Equally, vulnerabilities have been found in the way the iPhone implements its lockscreen, while a malfunction of Google’s Home Mini left it recording everything it heard even when not asked to.
What this research constitutes is a red flag that these devices could, in theory, be remotely controlled, not that they are being mis-used.
There does seem to be an unstoppable movement to embed voice control inside all sorts of devices that have never had such a feature before, including home security and door locking, which is opening up a whole new world security and privacy concerns.
For now, it is much more likely that the current generation of devices would be targeted to carry out unwanted surveillance (including by the companies themselves), rather than implementing advanced command spoofing.
But as security watchers know from experience, where the theory goes practice has a habit of following.

1 Comment

Totally agree with you. These devices are the same as unlocked desktops. They need to add voice to user recognition,,, like,,, Start Trek. Only those authorized can do some things. And even then it still takes an authorization code to do more powerful things like eject the warp core (bill credit card). Double when releasing command to a new captain. How google/siri/alexia developers missed this, (is there a programmer alive that hasn’t seen the show/movies?), is weak at best.

Comments are closed.

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?