SimVPv2: Towards Simple yet Powerful Spatiotemporal Predictive Learning
Cheng Tan, Zhangyang Gao, Siyuan Li, Stan Z. Li

TL;DR
SimVPv2 introduces a simplified, CNN-based spatiotemporal predictive model that outperforms previous complex architectures in accuracy and efficiency across multiple datasets.
Contribution
It presents a novel streamlined architecture with Gated Spatiotemporal Attention, eliminating the need for heavy Unet-like structures, and achieves state-of-the-art results with reduced complexity.
Findings
Superior performance on Moving MNIST with fewer FLOPs
Faster training and inference times
Robust generalization across diverse datasets
Abstract
Recent years have witnessed remarkable advances in spatiotemporal predictive learning, with methods incorporating auxiliary inputs, complex neural architectures, and sophisticated training strategies. While SimVP has introduced a simpler, CNN-based baseline for this task, it still relies on heavy Unet-like architectures for spatial and temporal modeling, which still suffers from high complexity and computational overhead. In this paper, we propose SimVPv2, a streamlined model that eliminates the need for Unet architectures and demonstrates that plain stacks of convolutional layers, enhanced with an efficient Gated Spatiotemporal Attention mechanism, can deliver state-of-the-art performance. SimVPv2 not only simplifies the model architecture but also improves both performance and computational efficiency. On the standard Moving MNIST benchmark, SimVPv2 achieves superior performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Data Stream Mining Techniques · Machine Learning and Data Classification
