Skip to content
Naked Security Naked Security

Snoops can tell what you type while you Skype, researchers find

Researchers developed an acoustic eavesdropping attack scenario using VoIP, hitting an accuracy rate of up to 91.7%.

If you type on a laptop or desktop keyboard while you Skype, call participants can snoop on what you’re writing, according to new research.

According to a paper from researchers at the University of California, Irvine; the Sapienza University of Rome; and the University of Padua, the sound of keystrokes, or acoustic emanations, can be recorded during a Skype voice or video call and later reassembled as text.

Gene Tsudik, Chancellor’s Professor of computer science at UCI and one of the coauthors, told ScienceBlog that eavesdroppers can learn exactly what you type, including confidential information such as passwords “and other very personal stuff.”

Acoustic snooping on keystrokes has been shown to be theoretically feasible in the past, but it’s been pretty much in the realm of James Bond.

There was the scenario in which researchers showed that a smartwatch’s motion sensors could be used to detect what keys you’re pressing with your left hand (or whatever hand the watch is on) and thus guess at the words you’re typing, for example.

But cybercrooks would have had to create an app that camouflages itself – for example, as a pedometer – and use it to track what someone types.

Before that, there was the team of researchers from Georgia Tech who demonstrated how to spy on what was typed on a regular desktop computer’s keyboard via the accelerometers of a mobile phone placed nearby, using special software to analyze vibrations set off by keystrokes.

That was a tougher proposition still: the phone had to be within 3 inches of the keyboard. Attackers would be out of luck if their targets left their phones in their pockets or purses or simply moved them any further than 3 inches away.

The beauty, or hazard, if you will, of Skype eavesdropping is that a snooper doesn’t need physical proximity to a target, precise profiling of the victim’s typing style and keyboard, and/or a significant amount of the intended victim’s typed information (and its corresponding sounds).

The researchers from Italy and California are calling their new acoustic eavesdropping attack Skype & Type (S&T), though it’s not just Skype that’s vulnerable. In fact, any Voice-over-IP (VoIP) software will do.

But one of the most popular VoIP softwares out there is Skype. The researchers found that Skype can acquire enough audio information on overheard typing to reconstruct keystrokes typed, including randomly generated passwords or PINs, with minimal profiling of the typist’s typing style and keyboard.

The ability to grab random keystrokes is a significant advance from earlier work, including that 2011 attack scenario described from Georgia Tech, which could only work reliably on words with three or more letters.

Anybody following sensible security practice doesn’t use a dictionary word for their password (though they well might be using the passphrase technique of stringing words together, made famous by the xkcd cartoon correcthorsebatterystaple).

Earlier acoustic attacks relied on the characteristics of collected keystrokes pairs compared against a dictionary.

The technique from 2015 that used a mobile phone’s sensors analyzed the timing of each keystroke and the displacement of the watch as the wearer moved his or her wrist to reach for keys that are nearer or further away.

With this recently described acoustics eavesdropping technique, an attacker familiar with a target’s typing style and type of keyboard (they all have different acoustics) can hit an accuracy of 91.7% in guessing any random key pressed.


It’s possible to build a profile of the acoustic emanation generated by each key on a given keyboard.

For example, the T on a MacBook Pro ‘sounds’ different from the same letter on another manufacturer’s product. It also sounds different from the R on the same keyboard, which is right next to T.

Even if an eavesdropper is ignorant of the keyboard being used and the typing style of a target, the accuracy can still hit 41.89%.

Are those results sufficient to guess somebody’s password?

The researchers said that if the attacker’s goal were to eavesdrop on a random password, trying the letters guessed at by a Skype & Type attack would reduce the average number of brute-force attempts to crack it by up to 12 orders of magnitude.

Even in the most challenging attack scenario, they say it would still reduce the brute-force attempts by one order of magnitude.

The researchers showed that their Skype & Type attack can also handle typical VoIP quality problems, including internet bandwidth fluctuations that cut call quality and the interruption of people speaking over the sound of typing.

Tsudik told ScienceBlog that this type of attack isn’t possible with touch-screen or holographic keyboards and keypads.

Plus, an attacker would have to be on the call, given that it would be extremely difficult to get past Skype call encryption to intercept keystrokes.

But that still leaves plenty of scenarios where people on a Skype call might be interested in snooping on others, Tsudik said.

The interesting thing is that people who talk on Skype are not always friends and do not always have mutual trust.

Imagine a call between lawyers on opposite sides of a legal case – or business competitors or diplomats representing different countries.


The authors of the paper claim “12 orders of magnitude improvement” (1,000,000,000,000-fold) in the conclusion, but calculate their best-case speed-up in the body of the paper as 107, which is seven orders of magnitude (10,000,000-fold).

As far as I can see, you can only get 12 orders of magnitude from the numbers in the paper if you take the cost of a raw brute force attack *against a password of 10 characters chosen randomly from 62 possibilities* [A-Za-z0-9] and divide it by the researchers’ very-best-case “accelerated attack” *against a password of 10 characters chosen from just 26 possibilities* [a-z].

(In real life, of course, randomly-chosen passwords of the 10-from-62 sort are usually a sign of a password manager, and so they will almost always be autofilled or entered by copy-and-paste, which renders the comparison above moot – the only acoustic emanation for “typing” the password will be the sound of the user pressing Ctrl-V/Command-V to paste.)

I’d also be interested to see actual real-world results from this research, which mentions the real world a fair bit. The authors talk about “Skype and Type”, but in their tests, there wasn’t actually any real user who was typing and using Skype at the same time: the keystrokes were recorded in advance at CD quality and then played back one at a time into the Skype software. That doesn’t invalidate the research, but it means it’s not exactly real-world, either. Perhaps the authors ought to have mentioned that in the conclusion?


These are interesting experiments but it seems to me there are two specific assumptions which greatly “help” the success rate:
First – the use of a typical “clicky” keyboard. If, like me, you have a totally silent keyboard then the difficulty of implementing acoustic analysis skyrockets.
Second – the watch accelerometer trick works well for those who are “hunt ‘n peck” typists. Those of us who learned true touch typing barely move their wrists at all.

I would love an opportunity to have someone try both of these tricks on me and see how that affected their success rate.


Also, I wonder how well it works when you factor in real typing, not just replaying individual typed letters one at a time, and if you use a microphone that isn’t the one build into your laptop (e.g. the throat mic built into some headphone cables).

Using the built-in mic for Skype often produces bad results, so many people avoid it…


Something similar can be inferred by the latest Hak5 You-tube video: Decrypt Morse Code via PC Sound Cards – Hak5 2108.
Using two computers close to each other one could “Hear” the other then with software “Decrypt” the Morse Code being generated by the other using just the microphone of the receiving computer and the speakers from the computer generating the tones.

Admittedly a different scenario, but shows perhaps one should be careful at the library, coffee shop, etc. where you can have someone’s laptop nearby!!


Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?
You’re now subscribed!