SMR: State Memory Replay for Long Sequence Modeling

Biqing Qi; Junqi Gao; Kaiyan Zhang; Dong Li; Jianxing Liu; Ligang Wu; and Bowen Zhou

arXiv:2405.17534·cs.LG·June 11, 2024

SMR: State Memory Replay for Long Sequence Modeling

Biqing Qi, Junqi Gao, Kaiyan Zhang, Dong Li, Jianxing Liu, Ligang Wu, and Bowen Zhou

PDF

Open Access 1 Video

TL;DR

This paper introduces SMR, a novel memory replay mechanism that enhances state space models' ability to handle non-uniform sampling in long sequence modeling, improving stability and generalization.

Contribution

The paper proposes SMR, a plug-and-play memory mechanism that addresses NSS issues in SSMs, enabling stable modeling of varying sampling points in long sequences.

Findings

01

SMR improves long-range modeling performance.

02

SMR enhances stability across different sampling points.

03

Experimental results show consistent gains in language modeling and LRA tasks.

Abstract

Despite the promising performance of state space models (SSMs) in long sequence modeling, limitations still exist. Advanced SSMs like S5 and S6 (Mamba) in addressing non-uniform sampling, their recursive structures impede efficient SSM computation via convolution. To overcome compatibility limitations in parallel convolutional computation, this paper proposes a novel non-recursive non-uniform sample processing strategy. Theoretical analysis of SSMs through the lens of Event-Triggered Control (ETC) theory reveals the Non-Stable State (NSS) problem, where deviations from sampling point requirements lead to error transmission and accumulation, causing the divergence of the SSM's hidden state. Our analysis further reveals that adjustments of input sequences with early memories can mitigate the NSS problem, achieving Sampling Step Adaptation (SSA). Building on this insight, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SMR: State Memory Replay for Long Sequence Modeling· underline

Taxonomy

TopicsParallel Computing and Optimization Techniques