GHOST: Unmasking Phantom States in Mamba2 via Grouped Hidden-state Output-aware Selection & Truncation
Michael Menezes, Anastasios Kyrillidis

TL;DR
GHOST is a structured pruning method that reduces the state dimension of Mamba2 models by 50% using forward-pass statistics, maintaining high fidelity and significantly decreasing inference overhead.
Contribution
GHOST introduces a control-theoretic inspired structured pruning framework that approximates balanced truncation without backpropagation, addressing inference bottlenecks in large models.
Findings
Achieves 50% state-dimension reduction on models from 130M to 2.7B parameters.
Maintains approximately 1 perplexity point increase on WikiText-2.
Reduces inference overhead significantly while preserving model fidelity.
Abstract
While Mamba2's expanded state dimension enhances temporal modeling, it incurs substantial inference overhead that saturates bandwidth during autoregressive generation. Standard pruning methods fail to address this bottleneck: unstructured sparsity leaves activations dense, magnitude-based selection ignores runtime dynamics, and gradient-based methods impose prohibitive costs. We introduce GHOST (Grouped Hidden-state Output-aware Selection and Truncation), a structured pruning framework that approximates control-theoretic balanced truncation using only forward-pass statistics. By jointly measuring controllability and observability, GHOST rivals the fidelity of gradient-based methods without requiring backpropagation. As a highlight, on models ranging from 130M to 2.7B parameters, our approach achieves a 50\% state-dimension reduction with approximately 1 perplexity point increase on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Quantum many-body systems · Neural Networks and Reservoir Computing
