Skip to content
Naked Security Naked Security

Serious Security: Why learning to touch-type could protect you from audio snooping

Fast, quiet, smooth, consistent and low impact... why true hacker-grade touch-typing might keep you more secure.

Audio recordings are dangerously easy to make these days, whether by accident or by design.

You could end up with your own permanent copy of something you thought you were discussing privately, preserved indefinitely in an uninterestingly-named file on your phone or laptop, thanks to hitting “Record” by mistake.

Someone else could end up with a permanent transcript of something you didn’t want preserved at all, thanks to them hitting “Record” on their phone or laptop in a way that wasn’t obvious.

Or you could knowingly record a meeting for later, just in case, with the apparent consent of everyone (or at least without any active objections from anyone), but never get round to deleting it from cloud storage until it’s too late.

Sneaky sound systems

Compared to video recordings, which are worrying enough given how easily they can be captured covertly, audio recordings are much easier to acquire surreptitiously, given that sound “goes round corners” while light, generally speaking, doesn’t.

A mobile phone laid flat on a desk and pointing directly upwards, for example, can reliably pick up most of the sounds in a room, even those coming from people and their computers that would be entirely invisible to the phone’s camera.

Likewise, your laptop microphone will record an entire room, even if everyone else is on the other side of the table, looking at the back of your screen.

Worse still, someone who isn’t in the room at all but is participating via a service such as Zoom or Teams can hear everything relayed from your side whenever your own microphone isn’t muted.

Remote meeting participants can permanently record whatever they receive from your end, and can do so without your knowlege or consent if they capture the audio stream without using the built-in features of the meeting software itself.

And that raises the long-running question, “What can audio snoops figure out, over and above what gets said in the room?”

What about any typing that you might do while the meeting is underway, perhaps because you’re taking notes, or because you just happen to type in your password during the meeting, for example to unlock your laptop because your screen saver decided you were AFK?

Attacks only ever get better

Recovering keystrokes from surreptitious recordings is not a new idea, and results in recent years have been surprisingly good, not least because:

  • Microphone quality has improved. Recording devices now typically capture more detail over a wider range of frequencies and volumes.
  • Portable storage sizes have increased. Higher data rates can be used, and sound samples stored uncompressed, without running out of disk space.
  • Processing speeds have gone up. Data can now be winnowed quickly even from huge data sets, and processed with ever-more-complex machine learning models to extract usable information from it.
  • Cybersecurity is becoming ever more important. Collectively, more of us now care about protecting ourselves from unwanted surveillance, making research into sound-snooping ever more mainstream.

A trio of British computer scientists (it seems they originally met up at Durham University in the North East of England, but are now spread out across the country) has just released a review-and-research paper on this very issue, entitled A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards.

In the paper, the researchers claim to have:

…achieved a top-1 classification accuracy of 95% on phone-recorded laptop keystrokes, representing improved results for classifiers not utilising language models and the second best accuracy seen across all surveyed literature.

In other words, their work isn’t entirely new, and they’re not yet in the number-one spot overall, but the fact that their keytroke recognition techniques don’t use “language models” has an important side-effect.

Language models, loosely speaking, help to reconstruct poor-quality data that follows known patterns, such as being written in English, by making likely corrections automatically, such as figuring out that text recognised as dada brech notidifivatipn is very likely to be data breach notification.

But this sort of automated correction isn’t much use on passwords, given that even passphrases often contain only word fragments or initialisms, and that the sort of variety we often throw into passwords, such as mixing the case of letters or inserting arbitrary punctuation marks, can’t reliably be “corrected” precisely because of its variety.

So a top-tier “hey, you just hit the P key” recogniser that doesn’t rely on knowing or guessing what letters you typed just beforehand or just afterwards…

…is likely to do a better job of figuring out or guessing any unstructured, pseudorandom stuff that you type in, such as when you are entering a password.

One size fits all

Intriguingly, and importantly, the researchers noted that the representative audio samples they captured carefully from their chosen device, a 2021-model Apple MacBook Pro 16″, turned out not to be specific to the laptop they used.

In other words, because laptop models tend to use as-good-as-identical components, attackers don’t need to get physical access to your laptop first in order to capture the starting data needed to train their keystroke recognition tools.

Assuming you and I have similar sorts of laptop, with the same model of keyboard installed, then any “sound signatures” that I capture under carefully controlled conditions from my own computer…

…can probably be applied more or less directly to live recordings later acquired from your keyboard, given the physical and acoustic similarities of the hardware.

What to do?

Here are some fascinating suggestions based on the findings in the paper:

  • Learn to touch-type. The researchers suggest that touch-typing is harder to reconstruct reliably via sound recordings. Touch-typists are generally much faster, quieter, smoother and more consistent in their style, as well as using less energy when activating the keys. We assume this makes it harder to isolate individual keystrokes for analysis in the first place, as well as making the sound signatures of different keys harder to tell apart.
  • Mix character case in passwords. The researchers noted that when the shift key was held down before a keystroke was entered, and then released afterwards, the individual sound signatures were much harder to isolate and match. (Those annoying password construction rules may be useful after all!)
  • Use 2FA wherever you can. Even if you have a 2FA system that requires you to type in a 6-digit code off your phone (which many people do by holding their phone in one hand and hunting-and-pecking the numbers with the other), each code only works once, so recovering it doesn’t help a password-thieving attacker much, if at all.
  • Don’t type in passwords or other confidential information during a meeting. If you get locked out of your laptop by your screensaver or by a security timeout, consider popping out of the room briefly while you log back in. A little delay could go a long way.
  • Mute your own microphone as much as can. Speak, or type, but don’t do both at once. The researchers suggest that Zoom recordings are good enough for keystroke recovery (though we think they tested only with high-quality local Zoom recordings, not with lower-quality cloud-based recordings initiated by remote particpants), so if you are the only person at your end, muting your microphone controls how many of your keystrokes other people get to hear.

15 Comments

I never learned to touch-type. However, my style of typing is very (shall we say) individual, and I type at a reasonable speed. I have been known to copy-type one-handed, while holding the source material in the other hand. And a colleague at one former workplace observed – correctly – that I play the piano. Hacker, make of that what you will!

I wonder if that individuality would work against you (by making your keystrokes different not only from one another uniquely)?

That’s quite the custom QWERTY layout Mr. Ducklin.

See comments below – it took someone else pointing out the error exactly (“no D!”) before I figured out what you meant :-)

I assumed you were referring to the weirdly coloured keys, not to the fact that the alphabet was incomplete… don’t learn to touch-type from that chart, anyway!

Lets go back to the DVORAK Keyboard, better still, the one handed DVORAK Left or Right keyboard, or we could fool these keyboard recording mic’s by using speech to text…

I’m not sure if it made it into either of the movies, but in the book Dune, walking across the sand had to be done in a way that didn’t attract the sand worms. In the brief time we have left before we are beaming our thoughts directly into computers (or AI is advanced enough to simply guess what we are about to do without being told), should be we learning to type in a way that is random enough to not allow statistical and AI analysis? Typing to the beat of a chaotic metronome, perhaps?

Or perhaps we need a keyboard that randomly scrambles the key locations every 30 seconds to make this kind of attack impossible? Users already assume that security exists only to make their jobs harder, so they are unlikely to be surprised by such a concept…

“Shuffled” virtual numeric keypads were tried by the banks back in the 2000s when banking malware was huge.

The virtual keypad was meant to defeat keyloggers, but the crooks simply switched to reading out the relative mouse positions when you clicked on each digit instead.

Thus shuffled keypads became a thing, so the numbers were in different positions each time… but the crooks just took screenshots to record which digit was where, or popped up their own replica “known-random keypad” over the one from the bank’s website to phish your PIN.

Still, all those “but the crooks…” lines means they had to do increasingly complicated things to defeat the security. If someone’s got enough access to put screen overlays between you and the login you are trying to do, you’re already basically defeated; it’s entirely unreasonable to expect any security measures at the site to protect you from someone with that much access to your local device. Likewise, shuffling the key positions periodically definitely would do more than touch typing to defeat acoustic snooping. Yes, it could still be defeated by someone who recorded the current status of the keyboard with each keystroke and coupled that with their acoustic knowledge of which key you pressed when.. but when they’ve got that level of access to your machine, both of those things are already irrelevant–they could just keep a direct record of what you typed instead of worrying about what the keyboard looks or sounds like.

I may hate the idea because I wouldn’t want to type on a keyboard like that, but it *would* render knowledge about which physical keys you pressed at what times pretty much useless. (Especially considering that the amount of time it takes to find the key you want is likely to vary randomly when it might be anywhere, so no more “common pairs”.)

Agreed. But the extreme inconvenience of regularly reshuffling everyone’s keyboards would make this sort of mitigation unpopular, if not unsusable…

…and it would certainly weigh against the “keep it smooth and consistent” mitigation of learning to touch-type :-)

Curiosity question: what ever happened to the ‘projected’ keyboards from the early 2000’s? They were supposed to project a keyboard via laser or other light source onto the desktop and you would simply type on the physical desktop surface. Would such a keyboard also be subject to this style attack?

I guess two main things happened: firstly, they didn’t really work that well; secondly, the iPhone.

The article discussing the importance of touch typing as a defense against audio snooping highlights a lesser-known yet critical aspect of cybersecurity. In today’s digital age, where personal privacy is at risk from various angles, even seemingly innocuous practices like typing can have security implications.

The concept that the sound of keyboard typing could potentially be used for audio snooping is both fascinating and concerning. Learning touch typing takes on a new level of significance as it not only enhances efficiency but also safeguards against potential privacy breaches.

The correlation between touch typing and audio snooping adds another layer to the multifaceted landscape of cybersecurity. It’s a reminder that our daily activities, even seemingly mundane ones, can have unexpected consequences in the digital realm. The suggestion to utilize background sounds, including music or white noise, to mask typing sounds is an innovative approach that underscores the need for creative solutions in the face of evolving threats.

As the digital world continues to evolve, staying informed about these nuances of cybersecurity becomes imperative. Learning touch typing could indeed become an additional line of defense against audio snooping, contributing to a more secure online experience.

The authors considered white noise. IIRC it isn’t good cover because it can be filtered out surprisingly well.

Comments are closed.

Subscribe to get the latest updates in your inbox.
Which categories are you interested in?