From static to adaptive: immune memory-based jailbreak detection for large language models

Jun Leng; Yu Liu; Litian Zhang; Ruihan Hu; Zhuting Fang; Xi Zhang

arXiv:2512.03356·cs.CR·January 13, 2026

From static to adaptive: immune memory-based jailbreak detection for large language models

Jun Leng, Yu Liu, Litian Zhang, Ruihan Hu, Zhuting Fang, Xi Zhang

PDF

Open Access

TL;DR

This paper introduces IMAG, an immune memory-inspired adaptive framework for detecting and mitigating jailbreak attacks on large language models, improving robustness and adaptability over static methods.

Contribution

The paper proposes a novel immune memory-based framework that enables LLMs to adaptively detect and respond to evolving jailbreak attacks, surpassing static detection methods.

Findings

01

Achieves 94% average detection accuracy across diverse attacks

02

Outperforms state-of-the-art static detection baselines

03

Demonstrates effective adaptive defense in multiple LLMs

Abstract

Large Language Models (LLMs) serve as the backbone of modern AI systems, yet they remain susceptible to adversarial jailbreak attacks. Consequently, robust detection of such malicious inputs is paramount for ensuring model safety. Traditional detection methods typically rely on external models trained on fixed, large-scale datasets, which often incur significant computational overhead. While recent methods shift toward leveraging internal safety signals of models to enable more lightweight and efficient detection. However, these methods remain inherently static and struggle to adapt to the evolving nature of jailbreak attacks. Drawing inspiration from the biological immune mechanism, we introduce the Immune Memory Adaptive Guard (IMAG) framework. By distilling and encoding safety patterns into a persistent, evolvable memory bank, IMAG enables adaptive generalization to emerging threats.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Immune Systems Applications · Adversarial Robustness in Machine Learning · vaccines and immunoinformatics approaches