Quantizing Small-Scale State-Space Models for Edge AI
Leo Zhao, Tristan Torchet, Melika Payvand, Laura Kriener, Filippo Moro

TL;DR
This paper explores quantization techniques for small-scale state-space models, particularly S4D, to reduce memory and computation costs for edge AI, demonstrating significant performance improvements with quantization-aware training.
Contribution
It introduces a comprehensive analysis of quantization effects on SSMs, proposes a heterogeneous quantization strategy, and shows how QAT can enable lower precision deployment without performance loss.
Findings
QAT improves performance from 40% to 96% on MNIST.
Heterogeneous quantization reduces memory by 6x.
Sensitivity of state matrix A and internal state x to quantization.
Abstract
State-space models (SSMs) have recently gained attention in deep learning for their ability to efficiently model long-range dependencies, making them promising candidates for edge-AI applications. In this paper, we analyze the effects of quantization on small-scale SSMs with a focus on reducing memory and computational costs while maintaining task performance. Using the S4D architecture, we first investigate post-training quantization (PTQ) and show that the state matrix A and internal state x are particularly sensitive to quantization. Furthermore, we analyze the impact of different quantization techniques applied to the parameters and activations in the S4D architecture. To address the observed performance drop after Post-training Quantization (PTQ), we apply Quantization-aware Training (QAT), significantly improving performance from 40% (PTQ) to 96% on the sequential MNIST benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Age of Information Optimization
MethodsFocus
