Site icon Sophos News

The sixth sense for cyber defense: Multimodal AI

At the 2024 Virus Bulletin conference, Sophos Principal Data Scientist Younghoo Lee presented a paper on SophosAI’s research into ‘multimodal’ AI (a system that integrates diverse data types into a unified analytical framework). In his talk, Lee explored the team’s novel empirical research on applying multimodal AI to the detection of spam, phishing, and unsafe web content.

What is multimodal AI?

Multimodal AI represents a significant shift in artificial intelligence. Rather than traditional single-mode analysis, multimodal systems can process multiple data streams simultaneously, synthesizing data from multiple inputs.

In the context of cybersecurity – and particularly when it comes to classifying threats – this is a powerful capability. Rather than analyzing textual and visual content separately, a multimodal system can process both, and ‘understand’ the intricate relationships between them.

For example, in phishing detection, multimodal AI examines the linguistic patterns and writing style of the text alongside the visual fidelity of logos and branding elements, while also analyzing the semantic consistency between textual and visual components. This holistic approach means that the system can identify sophisticated attacks that might appear, to more traditional systems, to be legitimate. Moreover, multimodal AI can learn from, and adapt to, the correlations between different data types, developing a sense of how legitimate and malicious content differs across multiple dimensions.

Capabilities

In his research, Lee details some of the detection capabilities of multimodal AI systems:

Text analysis and natural language understanding

Visual intelligence and brand verification

Advanced URL and security analysis

Case study: A fake Costco email

The below image is a genuine phishing attempt, designed to trick recipients into thinking that they have won a prize from Costco. The email looks official, complete with imitated Costco logo and branding.

A screenshot of an email taken on a mobile device. The email is imitating a genuine email from Costco and tells the recipient that they have won a prize. There is a blue button in the centre of the email inviting the user to click

Figure 1: A screenshot of a phishing email, purportedly from Costco

Multimodal AI can identify several suspicious aspects of this email, including:

As a result, the system assigns a high score to the email, flagging it as suspicious.

SophosAI also applied multimodal AI to NSFW (not safe for work) websites containing content relating to gambling, weapons, and more. As with the classification of phishing emails, detection leverages a number of capabilities, including the evaluation of keywords and phrases (agnostic of language), and analysis of imagery and graphics.

Experimental results

To test the efficacy of multimodal AI compared to traditional machine learning models such as Random Forest and XGBoost, SophosAI conducted a series of empirical experiments. The full results are available in Lee’s whitepaper and Virus Bulletin talk – but, briefly, traditional models performed well when detecting known threats, and struggled with new, unseen phishing emails. Their F1 scores (a measure that balances precision and recall to give an overall representation of accuracy between 0 and 1) were as low as 0.53 with unseen samples, reaching a high of 0.66. In contrast, multimodal AI (using GPT-4o) performed very well in detecting new phishing attempts, achieving F1 scores up to 0.97 even on unseen brands.

It was a similar story with NSFW content; traditional models achieved F1 scores of around 0.84-0.88, but models with multimodal AI embeddings achieved scores of up to 0.96.

Conclusion

The digital landscape is in a state of constant evolution, bringing with it an array of new threats – including the use of generative AI to deceive users. Phishing emails now meticulously, and routinely, mimic legitimate communications, while NSFW websites conceal harmful content behind deceptive visuals. While traditional cybersecurity methods remain important, they are increasingly inadequate on their own. Multimodal AI offers an innovative layer of defense that enhances our comprehension of content.

By effectively detecting sophisticated phishing emails and accurately classifying NSFW websites, multimodal AI not only protects users more effectively but also adapts to new threats. The experimental results Lee presents in his paper show significant improvements over traditional methods.

Going forward, incorporating multimodal AI into cybersecurity strategies is not just beneficial; it is crucial for ensuring the protection of our digital environment amid growing complexities and threats.

For further information, Lee’s full whitepaper is available here. A recording of his 2024 Virus Bulletin talk is available here (along with the slides).

Exit mobile version