mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling
Tristan Torchet, Christian Metzner, Karthik Charan Raghunathan, Jimmy Weber, Sebastian Billaudelle, Laura Kriener, Melika Payvand

TL;DR
mGRADE is a hybrid-memory sequence model combining delay embeddings and gated recurrence, achieving efficient long-range and fast dynamics modeling under strict memory constraints.
Contribution
It introduces a novel hybrid architecture with learnable delay embeddings and gating, enabling efficient multi-timescale sequence modeling on resource-limited devices.
Findings
mGRADE reduces memory usage by up to 8x compared to SOTA models.
It maintains competitive accuracy on Long-Range Arena and speech classification tasks.
The learnable spacings function as delay embeddings for fast dynamics reconstruction.
Abstract
Multi-timescale sequence modeling relies on capturing both local fast dynamics and global slow context; yet, maintaining these capabilities under the strict memory constraints common to edge devices remains an open challenge. Current State-of-the-Art models with constant memory footprints trade off long-range selectivity and high-precision modeling of fast dynamics. To overcome this trade-off within a fixed memory budget, we propose mGRADE (minimally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that introduces inductive biases across timescales by integrating a convolution with learnable temporal spacings with a lightweight gated recurrent component. We show theoretically that the learnable spacings are equivalent to a delay embedding, enabling parameter-efficient reconstruction of partially-observed fast dynamics, while the gated recurrent component…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
