Rigged YouTube videos can use Siri and Google Now to hijack your phone

Lisa Vaas

8 years ago

Researchers have cooked up a way to attack mobile devices that relies on a victim doing nothing more than listening to a boobytrapped YouTube video.

In the attack, “hidden” voice commands (more like impossible to understand for humans) trick voice-activated assistants like Google Now or Siri into doing whatever the attacker commands.

All a victim has to do is to listen to a rigged video from any of multiple sources, including a laptop, a computer, a smart TV, a smartphone or a tablet.

The researchers, from the University of California, Berkeley, and Georgetown University, say on their project page that the hidden voice commands are “unintelligible to human listeners but which are interpreted as commands by devices.”

You can hear the voice commands, which are indeed muddled, in the researchers’ VoiceHack demonstration video:

As the video shows, a mobile phone acts on the commands as soon as the voice-based assistants decipher them.

As the researchers note in their paper, they were able to take over a phone because many devices nowadays have adopted an always-on model in which they continuously listen for possible voice input.

That makes it easy for humans to access and interact with their devices. The potential benefits of always-on voice assistants was vividly demonstrated last month, when an Australian mother saved her 1-year-old daughter’s life by activating her dropped iPhone from across the room, telling Siri to call for an ambulance.

But that always-on state also leaves mobile devices vulnerable to voice attacks from an attacker who can manage to create sound within the vicinity of any targeted device within speaker range.

Of course, if a device owner hears the incoming command, they can just cancel it or take other corrective action.

That motivated the researchers to look for a way to hide those voice commands: i.e., to craft commands that a device will understand and act on, but that a human wouldn’t understand or possibly even notice.

In their demo, the researchers demonstrated a command for the phone to open the site xkcd.com. That suggests, of course, that a phone can be instructed to open up far nastier, malware-laden sites.

The possible repercussions of a successful attack, from their paper:

Depending upon the device, attacks could lead to information leakage (e.g., posting the user’s location on Twitter), cause denial of service (e.g., activating airplane mode), or serve as a stepping stone for further attacks (e.g., opening a web page hosting drive-by malware).

Such attacks could also be compounded if they were to be inflicted en masse, the researchers suggest: for example, hidden voice commands could be broadcast from a loudspeaker at an event or embedded in a trending YouTube video.

Their attacks proved successful even with background noise.

The team showed that attackers don’t need sophisticated knowledge about devices’ speech recognition systems. They came up with a general attack procedure for generating commands (that are intelligible to humans) that likely will work with any modern voice recognition system, including Google Now.

That’s what they call the black-box model.

The team also demonstrated a white-box model, in which attackers with significant knowledge of the internal workings of a speech recognition system can issue hidden voice commands that can’t be deciphered by humans.

This isn’t the first time that researchers have attacked mobile devices via their voice assistants. In October, a pair of French researchers created an attack to remotely hijack phones with radio waves.

José Lopes Esteves and Chaouki Kasmi, researchers for the French infosec agency ANSSI, described the radio wave attack in a talk at Hack in Paris and published their findings in the journal IEEE Transactions on Electromagnetic Capability.

In a video of their talk, they described the potential outcomes of such an attack, including turning a phone into an eavesdropping device by commanding it to make a call to an attacker’s monitoring phone, or making a phone visit a malicious phishing website, create embarrassing posts on your social media accounts, or launch a malicious app that could download malware.

But that remote attack was far more difficult to pull off than the more recent demonstration of hidden voice commands: to make their attack work, the French researchers sent FM radio signals from a laptop to an antenna, which transmitted the signals to a nearby voice-command enabled phone with headphones plugged in.

In that attack, the headphone cord acted as an antenna, sending commands through the microphone to a digital assistant like Siri.

In contrast, all an attacker has to do in the case of hidden voice commands is to get a target to listen to a video, and presumably not look at their phone for a bit.

But still, getting users to do that is more involved than simply delivering malware.

As we’ve noted before, Siri in particular has been vulnerable to being exploited to expose your personal information.

For advice on reviewing your phone’s security settings, please take a look at our popular article, Privacy and Security on Your Phone. (Covers iOS, Android and Windows Phone.)

The researchers also evaluate some potential defenses from these hidden voice command attacks, including notifying the user when a voice command is accepted, a verbal challenge-response protocol, and a machine-learning approach that they say has managed to detect the attacks with 99.8% accuracy.