When I speak about machine learning I know I run the risk of having half the audience start rolling their eyes. In some respects, I can’t blame them.
Machine learning, along with artificial intelligence, has become such an overused buzz term in the cybersecurity industry that the mere uttering of these words tends to draw more skepticism than excitement. But, there are real reasons we should be enthusiastic about it. In fact, we are already seeing real benefits from its application.
Machine learning is helping us change from a reactive to a proactive approach to cybersecurity. You can see this when we apply deep learning (an advanced form of machine learning) to the problem of unknown malware.
In the past, we had to wait until we had seen a piece of malware before we were able to block it. Now, we can “predict” if a file is malicious using our machine learning driven malware detection engine, as featured in Intercept X, without ever having seen it before, and with a significantly higher detection rate.
Over the long term, machine learning will have even greater benefits. It is no secret that as an industry we have struggled to keep up with cyber threats. The common solution has been to collect as much data as possible in order to gain visibility and insights.
The idea is that if we can get a better view of our environment, and access more data, we can use that data to detect threats that we would have otherwise missed. This is the “big data” strategy (and the other overused buzz term we were sick of before “machine learning”).
The big data strategy is a good one; however organizations struggle to fully implement it. One reason could be that our industry is facing a massive skills shortage, making it hard to actually use all the data we collect. Those that do have a team of skilled analysts often find themselves drowning in a sea of “big data”, resulting in information overload. Instead of helping to find a needle in a haystack, it can feel as if we just made the haystack bigger.
With machine learning, we now have a technology that can automate the analysis of information so that we can make better use of the data we collect – gaining more and smarter insights. If you use the right type of machine learning, you can essentially collect an infinite amount of data.
The use of this technology for analysis and automation can bridge the cybersecurity labor gap, giving us an advantage over attackers, rather than constantly being a step behind.
We know we cannot spread magic machine learning dust on everything and call it a day, but that shouldn’t make us less excited about its potential.
Anonymous
OK, so “Big Data” (a concept that no-one understands) is supposed to be leveraged to perform “Machine Learning” (a concept that no-one understands) and automagically solve our IR information overload by automating it (IR), and presto, there goes a fancy product sell. Give me a break.
For now I have not seen a single endpoint vendor able to demonstrate the added value or even the mere existence of a supposed “ML capability” embedded into their product. Behavioural analytics, yes, I have seen it in action. But your marketing relies on a cumulated assumption and no technical evidence. I do however agree with that statement: “Instead of helping to find a needle in a haystack, it can feel as if we just made the haystack bigger.”
This is not helping the grunts on the field, is it?
Seth Geftic
Your comment in many ways validates my point. So much marketing spin has made people treat machine learning with animosity. There is ample proof, both within and outside of cybersecurity, that machine learning can improve detection models. This includes both 3rd party testing (https://www.mrg-effitas.com/wp-content/uploads/2018/02/MRG_Comparative_2018_February_report.pdf), internal research (https://www.sophos.com/en-us/labs.aspx), and even the data we track on repositories like VirusTotal.
Also, it’s important that we realize machine learning isn’t a “product”, its a feature. This is why we included it as part of Intercept X, and did not introduce it as a standalone offering. It won’t be “automagically”, but it is an important tool that will help us improve. That is why we should stop rolling our eyes when we hear about it.
Patrick
You know, using the buzz word and explaing in simple terms what it means and what it can do is nice and all, but I’m an engineer. I want to know more. What kind of model do you use? How does it improve, using gradient descent? Backpropgation? Genetic algs? Where can I see the nitty-gritty of the training for your model? Such a long running project has to have learned things that can benefit the CS community at large.
Mark Stockley
If the articles on Sophos News under the Machine Learning tag aren’t giving you what you need, take a look at the Sophos whitepaper “Machine Learning: How to Build a Better Threat Detection Model”.
https://www.sophos.com/en-us/medialibrary/PDFs/technical-papers/machine-learning-how-to-build-a-better-threat-detection-model.pdf
And if you want to go deeper still, check out “Malware Data Science”, a book by two Sophos data scientists – Joshua Saxe and Hillary Sanders.
https://nostarch.com/malwaredatascience