CNN-based Multi-In-Multi-Out Model for Efficient Spatiotemporal Prediction
Hyeonseok Jin

TL;DR
This paper introduces MIMO-ESP, a CNN-based model that effectively captures global spatiotemporal information and improves efficiency for prediction tasks, outperforming existing models on multiple benchmarks.
Contribution
The paper proposes a novel CNN-based multi-in-multi-out architecture that overcomes local kernel limitations and reduces complexity by integrating Transformer-inspired design.
Findings
MIMO-ESP outperforms existing models on video, traffic, and precipitation datasets.
The model achieves high efficiency with competitive accuracy.
Ablation studies confirm the effectiveness of MIMO-ESP components.
Abstract
Recently, Convolutional Neural Network (CNN) or Transformer architecture based models have been proposed to overcome the limitations of Recurrent Neural Network (RNN) based models in spatiotemporal prediction. These models prevent the inefficiency of parallelization limitation due to the sequential properties and stacked error due to the recursive method, and show high performance. Novertheless, there are still some challengies. First, CNN based models have difficulty considering global information due to the local properties of the kernel, and their performance is limited. In addition, information is mixed because the time axis is combined with the channel axis of the image for processing. Models based on Transformer architecture have high complexity due to the self-attention calcuation and take a long training time. In this paper, we propose a new structure model called CNN-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
