VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting
Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang

TL;DR
This paper introduces VMRNN, a novel recurrent architecture combining Vision Mamba blocks with LSTM to improve efficiency and accuracy in spatiotemporal forecasting, addressing limitations of CNNs and ViTs.
Contribution
The paper presents the VMRNN cell, integrating Vision Mamba blocks with LSTM, offering a new approach for effective and efficient spatiotemporal prediction.
Findings
Achieves competitive results on various spatiotemporal tasks.
Maintains smaller model size compared to existing methods.
Demonstrates superior long-sequence modeling capabilities.
Abstract
Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics. However, modeling extensive global information remains a formidable challenge; CNNs are limited by their narrow receptive fields, and ViTs struggle with the intensive computational demands of their attention mechanisms. The emergence of recent Mamba-based architectures has been met with enthusiasm for their exceptional long-sequence modeling capabilities, surpassing established vision models in efficiency and accuracy, which motivates us to develop an innovative architecture tailored for spatiotemporal forecasting. In this paper, we propose the VMRNN cell, a new recurrent unit that integrates the strengths of Vision Mamba blocks with LSTM. We construct a network centered on VMRNN cells to tackle spatiotemporal prediction tasks effectively. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Remote Sensing in Agriculture · Satellite Image Processing and Photogrammetry
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
