Controllability Analysis of State Space-based Language Model
Mohamed Mabrok, Yalda Zafari

TL;DR
This paper introduces the Influence Score, a controllability metric for state-space models like Mamba, revealing how internal dynamics relate to model size, architecture, and emergent behaviors, and providing a new diagnostic tool.
Contribution
The paper proposes and validates the Influence Score, a novel controllability metric for state-space language models, enabling better understanding and comparison of their internal dynamics.
Findings
Influence Score increases with model size and training data.
Mamba models show recency bias and influence concentrated in mid-to-late layers.
Emergent behaviors only appear at larger scales, affecting token influence patterns.
Abstract
State-space models (SSMs), particularly Mamba, have become powerful architectures for sequence modeling, yet their internal dynamics remain poorly understood compared to attention-based models. We introduce and validate the Influence Score, a controllability-based metric derived from the discretized state-space parameters of Mamba and computed through a backward recurrence analogous to system observability. The score quantifies how strongly a token at position k affects all later states and outputs. We evaluate this measure across three Mamba variants: mamba-130m, mamba-2.8b, and mamba-2.8b-slimpj, using six experiments that test its sensitivity to temperature, prompt complexity, token type, layer depth, token position, and input perturbations. The results show three main insights: (1) the Influence Score increases with model size and training data, reflecting model capacity; (2) Mamba…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis · Bayesian Methods and Mixture Models
