CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

Alexandre Le Mercier; Thomas Demeester; Chris Develder

arXiv:2603.12206·cs.CL·March 30, 2026

CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks

Alexandre Le Mercier, Thomas Demeester, Chris Develder

PDF

1 Repo

TL;DR

CLASP is a lightweight, efficient classifier designed to detect hidden state poisoning attacks in state space models like Mamba, ensuring robustness of hybrid language models against adversarial manipulation.

Contribution

Introduces CLASP, a novel token-level classifier using pattern recognition in embeddings to defend SSMs from HiSPAs, with strong generalization and real-time processing capabilities.

Findings

01

CLASP achieves 95.9% token-level F1 score in detecting malicious tokens.

02

The model maintains high detection performance under unseen attack patterns.

03

CLASP operates with low computational resources, suitable for real-world deployment.

Abstract

State space models (SSMs) like Mamba have gained significant traction as efficient alternatives to Transformers, achieving linear complexity while maintaining competitive performance. However, Hidden State Poisoning Attacks (HiSPAs), a recently discovered vulnerability that corrupts SSM memory through adversarial strings, pose a critical threat to these architectures and their hybrid variants. Framing the HiSPA mitigation task as a binary classification problem at the token level, we introduce the CLASP model (Classifier Against State Poisoning) to defend against this threat. CLASP exploits distinct patterns in Mamba's block output embeddings (BOEs) and uses an XGBoost classifier to identify malicious tokens with minimal computational overhead. We consider a realistic scenario in which both SSMs and HiSPAs are likely to be used: an LLM screening r\'esum\'es to identify the best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://anonymous.4open.science/r/hispikes-91C0
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.