Scientists from the SophosAI team will present their research at the upcoming Conference on Applied Machine Learning in Information Security (CAMLIS) in Arlington, Virginia.
On October 23, Senior Data Scientist Ben Gelman will present a poster session on command line anomaly detection, research he previously presented at Black Hat USA 2025 and which we explored in a previous blog post.
Senior Data Scientist Tamás Vörös will give a talk on October 22 entitled “LLM Salting: From Rainbow Tables to Jailbreaks”, discussing a lightweight defense mechanism against large language model (LLM) jailbreaks.
LLMs such as GPT, Claude, Gemini, and LLaMA are increasingly deployed with minimal customization. This widespread reuse leads to model homogeneity across applications—from chatbots to productivity tools. This can lead to a security vulnerability: jailbreak prompts that bypass refusal mechanisms (a guardrail preventing a model from providing a particular kind of response) can be precomputed once and reused across many deployments. This is similar to the classic rainbow table attack in password security, where precomputed inputs are applied to multiple targets.
These generalized jailbreaks are a problem because many companies have customer-facing LLMs built on top of model classes – meaning that one jailbreak could work against all the instances built on top of a given model. And, of course, those jailbreaks could have multiple undesirable impacts – from exposing sensitive internal data, to producing incorrect, inappropriate, or even harmful responses.
Taking their inspiration from the world of cryptography, Tamás and team have developed a new technique called ‘LLM salting’, a lightweight fine-tuning method that disrupts jailbreak reuse.
Building on recent work showing that refusal behavior is governed by a single activation-space direction, LLM salting applies a small, targeted rotation to this ‘refusal direction.’ This preserves general capabilities, but invalidates precomputed jailbreaks, forcing adversaries to recompute attacks for each ‘salted’ copy of the model.
In their experiments, Tamás and team found that LLM salting was significantly more effective in reducing jailbreak success than standard fine-tuning and system prompt changes – making deployments more robust against attacks, without sacrificing accuracy.
In his talk, Tamás will share the results of his research and the methodology of his experiments, highlighting how LLM salting can help to protect companies, model owners, and users from generalized jailbreak techniques.
We’ll publish a more detailed article on this novel defense mechanism following the talk at CAMLIS.