Long Expressive Memory for Sequence Modeling
T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson, Michael, W. Mahoney

TL;DR
Long Expressive Memory (LEM) is a new gradient-based method that effectively captures long-term dependencies in sequential data, outperforming existing RNN variants across various tasks.
Contribution
LEM introduces a multiscale differential equation framework with theoretical guarantees, enhancing long-term dependency learning and mitigating gradient issues.
Findings
LEM outperforms state-of-the-art RNNs, GRUs, and LSTMs in multiple tasks.
Theoretical bounds show mitigation of exploding and vanishing gradients.
LEM can accurately approximate complex dynamical systems.
Abstract
We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies. LEM is gradient-based, it can efficiently process sequential tasks with very long-term dependencies, and it is sufficiently expressive to be able to learn complicated input-output maps. To derive LEM, we consider a system of multiscale ordinary differential equations, as well as a suitable time-discretization of this system. For LEM, we derive rigorous bounds to show the mitigation of the exploding and vanishing gradients problem, a well-known challenge for gradient-based recurrent sequential learning methods. We also prove that LEM can approximate a large class of dynamical systems to high accuracy. Our empirical results, ranging from image and time-series classification through dynamical systems prediction to speech recognition and language modeling, demonstrate that LEM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Computational Physics and Python Applications
