Serious Security: Why learning to touch-type could protect you from audio snooping

Paul Ducklin

9 months ago

Audio recordings are dangerously easy to make these days, whether by accident or by design.

You could end up with your own permanent copy of something you thought you were discussing privately, preserved indefinitely in an uninterestingly-named file on your phone or laptop, thanks to hitting “Record” by mistake.

Someone else could end up with a permanent transcript of something you didn’t want preserved at all, thanks to them hitting “Record” on their phone or laptop in a way that wasn’t obvious.

Or you could knowingly record a meeting for later, just in case, with the apparent consent of everyone (or at least without any active objections from anyone), but never get round to deleting it from cloud storage until it’s too late.

Sneaky sound systems

Compared to video recordings, which are worrying enough given how easily they can be captured covertly, audio recordings are much easier to acquire surreptitiously, given that sound “goes round corners” while light, generally speaking, doesn’t.

A mobile phone laid flat on a desk and pointing directly upwards, for example, can reliably pick up most of the sounds in a room, even those coming from people and their computers that would be entirely invisible to the phone’s camera.

Likewise, your laptop microphone will record an entire room, even if everyone else is on the other side of the table, looking at the back of your screen.

Worse still, someone who isn’t in the room at all but is participating via a service such as Zoom or Teams can hear everything relayed from your side whenever your own microphone isn’t muted.

Remote meeting participants can permanently record whatever they receive from your end, and can do so without your knowlege or consent if they capture the audio stream without using the built-in features of the meeting software itself.

And that raises the long-running question, “What can audio snoops figure out, over and above what gets said in the room?”

What about any typing that you might do while the meeting is underway, perhaps because you’re taking notes, or because you just happen to type in your password during the meeting, for example to unlock your laptop because your screen saver decided you were AFK?

Attacks only ever get better

Recovering keystrokes from surreptitious recordings is not a new idea, and results in recent years have been surprisingly good, not least because:

Microphone quality has improved. Recording devices now typically capture more detail over a wider range of frequencies and volumes.
Portable storage sizes have increased. Higher data rates can be used, and sound samples stored uncompressed, without running out of disk space.
Processing speeds have gone up. Data can now be winnowed quickly even from huge data sets, and processed with ever-more-complex machine learning models to extract usable information from it.
Cybersecurity is becoming ever more important. Collectively, more of us now care about protecting ourselves from unwanted surveillance, making research into sound-snooping ever more mainstream.

A trio of British computer scientists (it seems they originally met up at Durham University in the North East of England, but are now spread out across the country) has just released a review-and-research paper on this very issue, entitled A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards.

In the paper, the researchers claim to have:

…achieved a top-1 classification accuracy of 95% on phone-recorded laptop keystrokes, representing improved results for classifiers not utilising language models and the second best accuracy seen across all surveyed literature.

In other words, their work isn’t entirely new, and they’re not yet in the number-one spot overall, but the fact that their keytroke recognition techniques don’t use “language models” has an important side-effect.

Language models, loosely speaking, help to reconstruct poor-quality data that follows known patterns, such as being written in English, by making likely corrections automatically, such as figuring out that text recognised as dada brech notidifivatipn is very likely to be data breach notification.

But this sort of automated correction isn’t much use on passwords, given that even passphrases often contain only word fragments or initialisms, and that the sort of variety we often throw into passwords, such as mixing the case of letters or inserting arbitrary punctuation marks, can’t reliably be “corrected” precisely because of its variety.

So a top-tier “hey, you just hit the P key” recogniser that doesn’t rely on knowing or guessing what letters you typed just beforehand or just afterwards…

…is likely to do a better job of figuring out or guessing any unstructured, pseudorandom stuff that you type in, such as when you are entering a password.

One size fits all

Intriguingly, and importantly, the researchers noted that the representative audio samples they captured carefully from their chosen device, a 2021-model Apple MacBook Pro 16″, turned out not to be specific to the laptop they used.

In other words, because laptop models tend to use as-good-as-identical components, attackers don’t need to get physical access to your laptop first in order to capture the starting data needed to train their keystroke recognition tools.

Assuming you and I have similar sorts of laptop, with the same model of keyboard installed, then any “sound signatures” that I capture under carefully controlled conditions from my own computer…

…can probably be applied more or less directly to live recordings later acquired from your keyboard, given the physical and acoustic similarities of the hardware.

What to do?

Here are some fascinating suggestions based on the findings in the paper:

Learn to touch-type. The researchers suggest that touch-typing is harder to reconstruct reliably via sound recordings. Touch-typists are generally much faster, quieter, smoother and more consistent in their style, as well as using less energy when activating the keys. We assume this makes it harder to isolate individual keystrokes for analysis in the first place, as well as making the sound signatures of different keys harder to tell apart.
Mix character case in passwords. The researchers noted that when the shift key was held down before a keystroke was entered, and then released afterwards, the individual sound signatures were much harder to isolate and match. (Those annoying password construction rules may be useful after all!)
Use 2FA wherever you can. Even if you have a 2FA system that requires you to type in a 6-digit code off your phone (which many people do by holding their phone in one hand and hunting-and-pecking the numbers with the other), each code only works once, so recovering it doesn’t help a password-thieving attacker much, if at all.
Don’t type in passwords or other confidential information during a meeting. If you get locked out of your laptop by your screensaver or by a security timeout, consider popping out of the room briefly while you log back in. A little delay could go a long way.
Mute your own microphone as much as can. Speak, or type, but don’t do both at once. The researchers suggest that Zoom recordings are good enough for keystroke recovery (though we think they tested only with high-quality local Zoom recordings, not with lower-quality cloud-based recordings initiated by remote particpants), so if you are the only person at your end, muting your microphone controls how many of your keystrokes other people get to hear.