mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

Tristan Torchet; Christian Metzner; Karthik Charan Raghunathan; Jimmy Weber; Sebastian Billaudelle; Laura Kriener; Melika Payvand

arXiv:2507.01829·cs.LG·April 24, 2026

mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

Tristan Torchet, Christian Metzner, Karthik Charan Raghunathan, Jimmy Weber, Sebastian Billaudelle, Laura Kriener, Melika Payvand

PDF

TL;DR

mGRADE is a hybrid-memory sequence model combining delay embeddings and gated recurrence, achieving efficient long-range and fast dynamics modeling under strict memory constraints.

Contribution

It introduces a novel hybrid architecture with learnable delay embeddings and gating, enabling efficient multi-timescale sequence modeling on resource-limited devices.

Findings

01

mGRADE reduces memory usage by up to 8x compared to SOTA models.

02

It maintains competitive accuracy on Long-Range Arena and speech classification tasks.

03

The learnable spacings function as delay embeddings for fast dynamics reconstruction.

Abstract

Multi-timescale sequence modeling relies on capturing both local fast dynamics and global slow context; yet, maintaining these capabilities under the strict memory constraints common to edge devices remains an open challenge. Current State-of-the-Art models with constant memory footprints trade off long-range selectivity and high-precision modeling of fast dynamics. To overcome this trade-off within a fixed memory budget, we propose mGRADE (minimally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that introduces inductive biases across timescales by integrating a convolution with learnable temporal spacings with a lightweight gated recurrent component. We show theoretically that the learnable spacings are equivalent to a delay embedding, enabling parameter-efficient reconstruction of partially-observed fast dynamics, while the gated recurrent component…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.