Skip to content
Naked Security Naked Security

Rigged YouTube videos can use Siri and Google Now to hijack your phone

Researchers have demonstrated that "hidden" voice commands can trick voice-activated assistants into doing things like downloading malware.

Researchers have cooked up a way to attack mobile devices that relies on a victim doing nothing more than listening to a boobytrapped YouTube video.

In the attack, “hidden” voice commands (more like impossible to understand for humans) trick voice-activated assistants like Google Now or Siri into doing whatever the attacker commands.

All a victim has to do is to listen to a rigged video from any of multiple sources, including a laptop, a computer, a smart TV, a smartphone or a tablet.

The researchers, from the University of California, Berkeley, and Georgetown University, say on their project page that the hidden voice commands are “unintelligible to human listeners but which are interpreted as commands by devices.”

You can hear the voice commands, which are indeed muddled, in the researchers’ VoiceHack demonstration video:

As the video shows, a mobile phone acts on the commands as soon as the voice-based assistants decipher them.

As the researchers note in their paper, they were able to take over a phone because many devices nowadays have adopted an always-on model in which they continuously listen for possible voice input.

That makes it easy for humans to access and interact with their devices. The potential benefits of always-on voice assistants was vividly demonstrated last month, when an Australian mother saved her 1-year-old daughter’s life by activating her dropped iPhone from across the room, telling Siri to call for an ambulance.

But that always-on state also leaves mobile devices vulnerable to voice attacks from an attacker who can manage to create sound within the vicinity of any targeted device within speaker range.

Of course, if a device owner hears the incoming command, they can just cancel it or take other corrective action.

That motivated the researchers to look for a way to hide those voice commands: i.e., to craft commands that a device will understand and act on, but that a human wouldn’t understand or possibly even notice.

In their demo, the researchers demonstrated a command for the phone to open the site xkcd.com. That suggests, of course, that a phone can be instructed to open up far nastier, malware-laden sites.

The possible repercussions of a successful attack, from their paper:

Depending upon the device, attacks could lead to information leakage (e.g., posting the user’s location on Twitter), cause denial of service (e.g., activating airplane mode), or serve as a stepping stone for further attacks (e.g., opening a web page hosting drive-by malware).

Such attacks could also be compounded if they were to be inflicted en masse, the researchers suggest: for example, hidden voice commands could be broadcast from a loudspeaker at an event or embedded in a trending YouTube video.

Their attacks proved successful even with background noise.

The team showed that attackers don’t need sophisticated knowledge about devices’ speech recognition systems. They came up with a general attack procedure for generating commands (that are intelligible to humans) that likely will work with any modern voice recognition system, including Google Now.

That’s what they call the black-box model.

The team also demonstrated a white-box model, in which attackers with significant knowledge of the internal workings of a speech recognition system can issue hidden voice commands that can’t be deciphered by humans.

This isn’t the first time that researchers have attacked mobile devices via their voice assistants. In October, a pair of French researchers created an attack to remotely hijack phones with radio waves.

José Lopes Esteves and Chaouki Kasmi, researchers for the French infosec agency ANSSI, described the radio wave attack in a talk at Hack in Paris and published their findings in the journal IEEE Transactions on Electromagnetic Capability.

In a video of their talk, they described the potential outcomes of such an attack, including turning a phone into an eavesdropping device by commanding it to make a call to an attacker’s monitoring phone, or making a phone visit a malicious phishing website, create embarrassing posts on your social media accounts, or launch a malicious app that could download malware.

But that remote attack was far more difficult to pull off than the more recent demonstration of hidden voice commands: to make their attack work, the French researchers sent FM radio signals from a laptop to an antenna, which transmitted the signals to a nearby voice-command enabled phone with headphones plugged in.

In that attack, the headphone cord acted as an antenna, sending commands through the microphone to a digital assistant like Siri.

In contrast, all an attacker has to do in the case of hidden voice commands is to get a target to listen to a video, and presumably not look at their phone for a bit.

But still, getting users to do that is more involved than simply delivering malware.

As we’ve noted before, Siri in particular has been vulnerable to being exploited to expose your personal information.

For advice on reviewing your phone’s security settings, please take a look at our popular article, Privacy and Security on Your Phone. (Covers iOS, Android and Windows Phone.)

The researchers also evaluate some potential defenses from these hidden voice command attacks, including notifying the user when a voice command is accepted, a verbal challenge-response protocol, and a machine-learning approach that they say has managed to detect the attacks with 99.8% accuracy.

18 Comments

I can clearly understand what it’s saying. It’s sound like robotic voice from movies but you can pretty much understand it. I don’t see how this is an “hack”

Speaking of movies… How about a movie where this “hack” turns your out-of-reach phone’s airplane mode on so that the phone can’t be used (because it’s out-of-reach and google now doesn’t work with airplane mode)

I agree, it sounds like the Borg issuing commands, but they are mostly intelligible.

This is a PR stunt: a recorded voice unlocking one device prepared by the researchers and placed in a carefully orchestrated environment.

Was the device trained on the gravelly voice in advance? We don’t know. How many takes were needed in the video? We don’t know. How well does the recorded voice work with phones provided by volunteers? We don’t know.

This is fun but it isn’t really “research” or “science”.

The headline should probably be “Students unlock own phone using recorded voice, upload video to YouTube.”

On my phone right here, with Google Now voice activation. I’ve had Now activated while watching videos of people actually speaking it, it works with my voice and even worked while watching the embedded video, but the robot voice did nothing. And Google’s voice recognition software is the best in the world, so I doubt any flaw will last very long.

That voice sounds so demonic that I can only imagine the reactions it would get in the wild. Just imagine the Christian moms on Facebook! And I thought it was bad when Satan’s subliminal messages had to be played backwards.

Seriously, though, I would love to see a demonstration of the white-box model. While I don’t think most people could understand the black-box model, it also doesn’t sound like something that belongs in your video (even with background noise.) Unless you’re listening to heavy metal, I guess. :-)

Garbage. The commands are very intelligible and just run through an audio filter. Even with background noise playing the commands are pretty obvious.

How about baby crying background noise?
Why baby crying you might ask.
Because an adult would notice something being wrong (and might even recognize the commands), but a toddler watching a finger-puppet-family video on YouTube wouldn’t understand anything and would only be frightened by the sounds

This risk could be limited while still providing the possibility of some benefit if they allowed Siri at lock but severely limited its functionality. When the phone is locked, you can’t make a call, but you can call for emergency services. Why should you be able to post to Twitter, search the web for something, or do other things in a locked state beyond the very basic of human necessities?

if you were watching a video, which of these would make you more suspicious…

1. A film scene set in a company meeting where the actors behave normally, including talking to each other and issuing innocent-sounding but carefully chosen phone commands?

2. A gravelly, disembodied voice that is suspicious but intelligible and that issues deliberately disguised phone commands sounding exactly like deliberately disguised phone commands?

Busted.

This is a non-issue.

My phone (N6P), watch (G Watch R) and tablet (N9) are all in the vicinity of this video when it was playing. They are trained to my voice and hyper-sesnsitive to my speech as they will occasionally activate if I say something that sounds similar phonetically.
However, none of them activated when I isolated that sound and played it nearby, my guess is that their test devices have 3 voice training models on Now/Ok Google that are so wide ranging and vague, it’ll pick up a lot of variations of that voice.

N/A.

Hey guys.. Watch out for videos with sounds that can break your phone.. They sound like this one.. Press play.

Just do what I do and disable Siri. I never use that speech recognition crap anyways.

My Nexus 7(2013) RUNNING AOSP 6.0.1 was sitting beside me along with my sons Asus Zenfone 2 running AOSP 6.0.1, neither of them woke up or showed any recognition to the video above. Wow I guess you really have to dumb down the curriculum to stay in line with ” no child left behind” don’t you. How is this any new hack or ground breaking discovery? Frankly if my son went to school there I would be livid that this is what my hard earned tuition was paying for. Something my son could tell you on his own. And he’s 12.

Notice that on all the tests, they had the phone unlocked and sitting on the Google app on a phone with no trained voice model so it will respond to anyone. Interesting idea… for a middle school science fair project. This is in no way even remotely a security concern for anyone.

This demo didn’t do anything on my Nexus 6P which completely ignored this robotic voice. Google Now works fine with my voice though.

Comments are closed.

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?