SpectralGuard: Detecting Memory Collapse Attacks in State Space Models
Davi Bonetto

TL;DR
SpectralGuard is a real-time spectral monitoring tool designed to detect and prevent memory collapse attacks in State Space Models, enhancing their safety and reliability during sequence processing.
Contribution
The paper introduces SpectralGuard, a novel spectral monitoring method that detects memory collapse attacks in SSMs, providing a deployable safety layer with high accuracy.
Findings
SpectralGuard achieves F1=0.961 against non-adaptive attacks.
It maintains F1=0.842 under adaptive attack scenarios.
Spectral monitoring is effective across different SSM architectures.
Abstract
State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius rho(A-bar) of the discretized transition operator governs effective memory horizon: when an adversary drives rho toward zero through gradient-based Hidden State Poisoning, memory collapses from millions of tokens to mere dozens, silently destroying reasoning capacity without triggering output-level alarms. We prove an Evasion Existence Theorem showing that for any output-only defense, adversarial inputs exist that simultaneously induce spectral collapse and evade detection, then introduce SpectralGuard, a real-time monitor that tracks spectral stability across all model layers. SpectralGuard achieves F1=0.961 against non-adaptive attackers and retains F1=0.842 under the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Physical Unclonable Functions (PUFs) and Hardware Security
