RAM: Replace Attention with MLP for Efficient Multivariate Time Series Forecasting
Suhan Guo, Jiahong Deng, Yi Wei, Hui Dou, Furao Shen, and Jian Zhao

TL;DR
This paper introduces RAM, a method that replaces the attention mechanism in time series forecasting models with simpler MLP components, significantly reducing computational cost while maintaining high accuracy.
Contribution
The paper proposes a novel pruning strategy, RAM, that approximates attention with feedforward layers, reducing FLOPs by over 40% with minimal performance loss.
Findings
FLOPs reduced by 62.579% in spatio-temporal models
FLOPs reduced by 42.233% in long-term forecasting models
Less than 2.5% performance drop in spatio-temporal models
Abstract
Attention-based architectures have become ubiquitous in time series forecasting tasks, including spatio-temporal (STF) and long-term time series forecasting (LTSF). Yet, our understanding of the reasons for their effectiveness remains limited. In this work, we propose a novel pruning strategy, eplace ttention with LP (RAM), that approximates the attention mechanism using only feedforward layers, residual connections, and layer normalization for temporal and/or spatial modeling in multivariate time series forecasting. Specifically, the Q, K, and V projections, the attention score calculation, the dot-product between the attention score and the V, and the final projection can be removed from the attention-based networks without significantly degrading the performance, so that the given network remains the top-tier compared to other SOTA methods. RAM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
MethodsSoftmax · Attention Is All You Need · Pruning · Layer Normalization
