Mirror, mirror on the wall, which is the worst side-channel vulnerability of them all?
For a while it was Meltdown and Spectre, the two biggies that kicked off the era of microprocessor security worry in early 2018, followed some months later by another contender, PortSmash.
In May this year, news emerged of more weaknesses with fancy names – ZombieLoad (CVE-2018-12130), RIDL, and Fallout (CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091).
The thread loosely holding this list together is a new class of weaknesses known as Microarchitectural Data Sampling (MDS) flaws, in the case of PostSmash and ZombieLoad in Intel’s Simultaneous Multithreading (SMT) hyper-threading.
When it was introduced nearly 20 years ago by Intel, SMT multithreading was promoted as a clever way of boosting processor performance.
In the absence of patches, the simplest way to mitigate the numerous security issues stemming from hyper-threading was to turn it off via the BIOS, something researchers initially estimated would cause a performance drop of up to 30% for datacentre installations, depending on which flaw was being addressed.
Lock it up
During 2018, the maintainers of security-first operating system OpenBSD started recommending turning SMT off if it was being used in certain types of installation – just patching it on a piecemeal basis wasn’t enough.
An easy-to-miss mainstream follow up to that was Google’s 2019 decision to disable hyper-threading on Chrome v74 in its Chromebooks, a move it followed up with additional mitigations in later versions.
By now, the SMT fire was burning on several fronts, especially comments made by the maintainer of the stable branch of Linux, Greg Kroah-Hartman. In May, he summed up a year of doubt about SMT:
As I said before just over a year ago, Intel once again owes a bunch of people a lot of drinks for fixing their hardware bugs, in our software…
Only days ago, Kroah-Hartman came back with another salvo in comments to The Register:
A year ago, they [OpenBSD] said disable hyper-threading, there’s going to be lots of problems here. They chose security over performance at an earlier stage than anyone else. Disable hyper-threading. That’s the only way you can solve some of these issues. We are slowing down your workloads. Sorry.
And there is no way of jumping the performance shark either:
I see a slowdown of about 20 per cent. That’s real. As kernel developers we fight for a 1 per cent, 2 per cent speed increase. Put these security things in, and we go back like a year in performance. It’s sad.
Reducing performance by that big a hit could cause major issues for datacentres to the extent they might have to consider leaving it turned on and take the risk.
Encouraging the conservative response is the fact that reported attacks exploiting issues such as ZombieLoad are non-existent.
That might be because attackers have yet to figure out how to do that or because detecting side-channel attacks is difficult, or even impossible, once a compromise fundamental enough to reach microprocessor level has been attained.
But when someone like Kroah-Hartman starts talking about performance as a necessary sacrifice – possibly for many years to come – perhaps we should listen.
What’s become apparent is that patching side-channel issues is the microprocessor problem with no simple answer.
Customers will carry on patching the issues that pop up, caught in a sort of dented version of Moore’s Law where microprocessor performance continues to rise exponentially for some customers, but not others.
Anonymous
You can certainly do better, and review your article. MDS can’t be disabled, so Google didn’t disable it on the Chromebooks. SMT, on the other hand, was indeed disabled.
Mark Stockley
You are correct, I’ve updated the article to say that hyper-threading was disabled, not MDS.
Anonymous
Now that the heads on fire act is beginning to subside and Meltdown was cured with ASLR (duh! really obvious solution, why did it take Meltdown to implement this?), why can’t we just assume that side channel attacks are not going away and an alternative approach is needed? Just because some researcher has worked out how to manipulate branch prediction to see information that they shouldn’t be able to see, surely the answer is to encrypt each of the branches with a key that only that branch has access to? We’ve already solved this with https and key exchanges – just implement it the kernel. How you manage the keys is another issue, but come on, how else can this be done effectively?
Clear all the crap out of the kernel, carry on optimising it, and implement some form of encryption. These problems will not go away with existing hardware and nor will they go away with newer architecture – perhaps a fix will work for a while until the new scary logo and appropriate name get thought up (dibs on Spook / Phantom / Wraith). It’s the kernel that’s flawed because it assumes the hardware is without flaw which is unlikely. Change the functionality of the kernel and assume the hardware will always be flawed. Problem solved.
Paul Ducklin
Maybe RISC will come back into fashion?!? (RISC = ‘reduced instruction set computing’, where you take the complexity and performance tweaks out of the hardware and concentrate on raw, low-level speed supported by cleverly-written software and smart compilers.)