Quantizing Small-Scale State-Space Models for Edge AI

Leo Zhao; Tristan Torchet; Melika Payvand; Laura Kriener; Filippo Moro

arXiv:2506.12480·cs.LG·June 17, 2025

Quantizing Small-Scale State-Space Models for Edge AI

Leo Zhao, Tristan Torchet, Melika Payvand, Laura Kriener, Filippo Moro

PDF

Open Access

TL;DR

This paper explores quantization techniques for small-scale state-space models, particularly S4D, to reduce memory and computation costs for edge AI, demonstrating significant performance improvements with quantization-aware training.

Contribution

It introduces a comprehensive analysis of quantization effects on SSMs, proposes a heterogeneous quantization strategy, and shows how QAT can enable lower precision deployment without performance loss.

Findings

01

QAT improves performance from 40% to 96% on MNIST.

02

Heterogeneous quantization reduces memory by 6x.

03

Sensitivity of state matrix A and internal state x to quantization.

Abstract

State-space models (SSMs) have recently gained attention in deep learning for their ability to efficiently model long-range dependencies, making them promising candidates for edge-AI applications. In this paper, we analyze the effects of quantization on small-scale SSMs with a focus on reducing memory and computational costs while maintaining task performance. Using the S4D architecture, we first investigate post-training quantization (PTQ) and show that the state matrix A and internal state x are particularly sensitive to quantization. Furthermore, we analyze the impact of different quantization techniques applied to the parameters and activations in the S4D architecture. To address the observed performance drop after Post-training Quantization (PTQ), we apply Quantization-aware Training (QAT), significantly improving performance from 40% (PTQ) to 96% on the sequential MNIST benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Age of Information Optimization

MethodsFocus