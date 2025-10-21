A salt shaker lying flat on a blue tabletop, with the lid off and salt spilling out on to the table
Getting salty with LLMs: SophosAI unveils new defense against jailbreaking at CAMLIS 2025

On October 22-24, SophosAI will present research on ‘LLM salting’ (a novel countermeasure against jailbreaks) and command line classification at CAMLIS 2025
October 21, 2025
Scientists from the SophosAI team will present their research at the upcoming Conference on Applied Machine Learning in Information Security (CAMLIS) in Arlington, Virginia.

On October 23, Senior Data Scientist Ben Gelman will present a poster session on command line anomaly detection, research he previously presented at Black Hat USA 2025 and which we explored in a previous blog post.

Senior Data Scientist Tamás Vörös will give a talk on October 22 entitled “LLM Salting: From Rainbow Tables to Jailbreaks”, discussing a lightweight defense mechanism against large language model (LLM) jailbreaks.

LLMs such as GPT, Claude, Gemini, and LLaMA are increasingly deployed with minimal customization. This widespread reuse leads to model homogeneity across applications—from chatbots to productivity tools. This can lead to a security vulnerability: jailbreak prompts that bypass refusal mechanisms (a guardrail preventing a model from providing a particular kind of response) can be precomputed once and reused across many deployments. This is similar to the classic rainbow table attack in password security, where precomputed inputs are applied to multiple targets.

These generalized jailbreaks are a problem because many companies have customer-facing LLMs built on top of model classes – meaning that one jailbreak could work against all the instances built on top of a given model. And, of course, those jailbreaks could have multiple undesirable impacts – from exposing sensitive internal data, to producing incorrect, inappropriate, or even harmful responses.

Taking their inspiration from the world of cryptography, Tamás and team have developed a new technique called ‘LLM salting’, a lightweight fine-tuning method that disrupts jailbreak reuse.

Building on recent work showing that refusal behavior is governed by a single activation-space direction, LLM salting applies a small, targeted rotation to this ‘refusal direction.’ This preserves general capabilities, but invalidates precomputed jailbreaks, forcing adversaries to recompute attacks for each ‘salted’ copy of the model.

In their experiments, Tamás and team found that LLM salting was significantly more effective in reducing jailbreak success than standard fine-tuning and system prompt changes – making deployments more robust against attacks, without sacrificing accuracy.

In his talk, Tamás will share the results of his research and the methodology of his experiments, highlighting how LLM salting can help to protect companies, model owners, and users from generalized jailbreak techniques.

We’ll publish a more detailed article on this novel defense mechanism following the talk at CAMLIS.

Tamás Vörös is a Senior Data Scientist at SophosAI, where he explores how machine learning and large language models can make cybersecurity smarter and safer. His recent projects include developing ways to harden AI models against jailbreak attacks (LLM Salting) and remove hidden backdoors (LLMBotomy).

Born and bred at SophosLabs straight out of university, Tamás has worked across web and spam protection, threat intelligence, and applied AI research. Over the years, his focus has shifted from hands-on detection systems to advancing the safety and interpretability of modern language models.

He regularly presents his research at conferences such as CAMLIS, Black Hat Europe, and Bsides. Tamás holds a Master’s degree in Computer Science from Eötvös Loránd University and is currently studying Psychology at Pázmány Péter University.

Ben Gelman is a Senior Data Scientist at Sophos AI researching the uses of AI/ML in cybersecurity. His work primarily focuses on alert prioritization, case analysis, and detection of malicious command lines. In prior work, he used deep learning in a variety of domains, including source code analysis, natural language processing, image recognition, data privacy, and hyperparameter optimization.

