SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

Davi Bonetto

arXiv:2603.12414·cs.LG·March 16, 2026

SpectralGuard: Detecting Memory Collapse Attacks in State Space Models

Davi Bonetto

PDF

Open Access

TL;DR

SpectralGuard is a real-time spectral monitoring tool designed to detect and prevent memory collapse attacks in State Space Models, enhancing their safety and reliability during sequence processing.

Contribution

The paper introduces SpectralGuard, a novel spectral monitoring method that detects memory collapse attacks in SSMs, providing a deployable safety layer with high accuracy.

Findings

01

SpectralGuard achieves F1=0.961 against non-adaptive attacks.

02

It maintains F1=0.842 under adaptive attack scenarios.

03

Spectral monitoring is effective across different SSM architectures.

Abstract

State Space Models (SSMs) such as Mamba achieve linear-time sequence processing through input-dependent recurrence, but this mechanism introduces a critical safety vulnerability. We show that the spectral radius rho(A-bar) of the discretized transition operator governs effective memory horizon: when an adversary drives rho toward zero through gradient-based Hidden State Poisoning, memory collapses from millions of tokens to mere dozens, silently destroying reasoning capacity without triggering output-level alarms. We prove an Evasion Existence Theorem showing that for any output-only defense, adversarial inputs exist that simultaneously induce spectral collapse and evade detection, then introduce SpectralGuard, a real-time monitor that tracks spectral stability across all model layers. SpectralGuard achieves F1=0.961 against non-adaptive attackers and retains F1=0.842 under the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Physical Unclonable Functions (PUFs) and Hardware Security