Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier; Chris Develder; Thomas Demeester

arXiv:2601.01972·cs.CL·May 15, 2026

Hidden State Poisoning Attacks against Mamba-based Language Models

Alexandre Le Mercier, Chris Develder, Thomas Demeester

PDF

1 Repo

TL;DR

This paper investigates the vulnerability of Mamba-based state space models to Hidden State Poisoning Attacks, revealing their susceptibility and proposing interpretability insights for mitigation.

Contribution

It introduces the HiSPA attack, evaluates its impact on Mamba models using the RoBench-25 benchmark, and extends findings to Mamba-2 and hybrid models, with interpretability analysis for defense.

Findings

01

Mamba models are vulnerable to HiSPA attacks causing information loss.

02

RoBench-25 effectively evaluates model robustness against HiSPA.

03

Interpretability analysis suggests potential mitigation strategies.

Abstract

State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their adversarial robustness remains critically unexplored. This paper studies the phenomenon whereby specific short input phrases induce a partial amnesia effect in such models, by irreversibly overwriting information in their hidden states, referred to as a Hidden State Poisoning Attack (HiSPA). Our benchmark RoBench-25 allows evaluating a model's information retrieval capabilities when subject to HiSPAs, and confirms the vulnerability of SSMs against such attacks. Even the recent Jamba-1.7-Mini SSM--Transformer (a 52B hybrid model) collapses on RoBench-25 under some HiSPA triggers, whereas pure Transformers do not. We also observe that HiSPA triggers significantly weaken the Jamba model on the popular Open-Prompt-Injections benchmark, unlike pure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/hispa_anonymous-5DB0
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.