Sophos News

Serious Security: Why learning to touch-type could protect you from audio snooping

Audio recordings are dangerously easy to make these days, whether by accident or by design.

You could end up with your own permanent copy of something you thought you were discussing privately, preserved indefinitely in an uninterestingly-named file on your phone or laptop, thanks to hitting “Record” by mistake.

Someone else could end up with a permanent transcript of something you didn’t want preserved at all, thanks to them hitting “Record” on their phone or laptop in a way that wasn’t obvious.

Or you could knowingly record a meeting for later, just in case, with the apparent consent of everyone (or at least without any active objections from anyone), but never get round to deleting it from cloud storage until it’s too late.

Sneaky sound systems

Compared to video recordings, which are worrying enough given how easily they can be captured covertly, audio recordings are much easier to acquire surreptitiously, given that sound “goes round corners” while light, generally speaking, doesn’t.

A mobile phone laid flat on a desk and pointing directly upwards, for example, can reliably pick up most of the sounds in a room, even those coming from people and their computers that would be entirely invisible to the phone’s camera.

Likewise, your laptop microphone will record an entire room, even if everyone else is on the other side of the table, looking at the back of your screen.

Worse still, someone who isn’t in the room at all but is participating via a service such as Zoom or Teams can hear everything relayed from your side whenever your own microphone isn’t muted.

Remote meeting participants can permanently record whatever they receive from your end, and can do so without your knowlege or consent if they capture the audio stream without using the built-in features of the meeting software itself.

And that raises the long-running question, “What can audio snoops figure out, over and above what gets said in the room?”

What about any typing that you might do while the meeting is underway, perhaps because you’re taking notes, or because you just happen to type in your password during the meeting, for example to unlock your laptop because your screen saver decided you were AFK?

Attacks only ever get better

Recovering keystrokes from surreptitious recordings is not a new idea, and results in recent years have been surprisingly good, not least because:

A trio of British computer scientists (it seems they originally met up at Durham University in the North East of England, but are now spread out across the country) has just released a review-and-research paper on this very issue, entitled A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards.

In the paper, the researchers claim to have:

…achieved a top-1 classification accuracy of 95% on phone-recorded laptop keystrokes, representing improved results for classifiers not utilising language models and the second best accuracy seen across all surveyed literature.

In other words, their work isn’t entirely new, and they’re not yet in the number-one spot overall, but the fact that their keytroke recognition techniques don’t use “language models” has an important side-effect.

Language models, loosely speaking, help to reconstruct poor-quality data that follows known patterns, such as being written in English, by making likely corrections automatically, such as figuring out that text recognised as dada brech notidifivatipn is very likely to be data breach notification.

But this sort of automated correction isn’t much use on passwords, given that even passphrases often contain only word fragments or initialisms, and that the sort of variety we often throw into passwords, such as mixing the case of letters or inserting arbitrary punctuation marks, can’t reliably be “corrected” precisely because of its variety.

So a top-tier “hey, you just hit the P key” recogniser that doesn’t rely on knowing or guessing what letters you typed just beforehand or just afterwards…

…is likely to do a better job of figuring out or guessing any unstructured, pseudorandom stuff that you type in, such as when you are entering a password.

One size fits all

Intriguingly, and importantly, the researchers noted that the representative audio samples they captured carefully from their chosen device, a 2021-model Apple MacBook Pro 16″, turned out not to be specific to the laptop they used.

In other words, because laptop models tend to use as-good-as-identical components, attackers don’t need to get physical access to your laptop first in order to capture the starting data needed to train their keystroke recognition tools.

Assuming you and I have similar sorts of laptop, with the same model of keyboard installed, then any “sound signatures” that I capture under carefully controlled conditions from my own computer…

…can probably be applied more or less directly to live recordings later acquired from your keyboard, given the physical and acoustic similarities of the hardware.

What to do?

Here are some fascinating suggestions based on the findings in the paper: