RAM: Replace Attention with MLP for Efficient Multivariate Time Series Forecasting

Suhan Guo; Jiahong Deng; Yi Wei; Hui Dou; Furao Shen; and Jian Zhao

arXiv:2410.24023·cs.LG·May 13, 2025

RAM: Replace Attention with MLP for Efficient Multivariate Time Series Forecasting

Suhan Guo, Jiahong Deng, Yi Wei, Hui Dou, Furao Shen, and Jian Zhao

PDF

Open Access

TL;DR

This paper introduces RAM, a method that replaces the attention mechanism in time series forecasting models with simpler MLP components, significantly reducing computational cost while maintaining high accuracy.

Contribution

The paper proposes a novel pruning strategy, RAM, that approximates attention with feedforward layers, reducing FLOPs by over 40% with minimal performance loss.

Findings

01

FLOPs reduced by 62.579% in spatio-temporal models

02

FLOPs reduced by 42.233% in long-term forecasting models

03

Less than 2.5% performance drop in spatio-temporal models

Abstract

Attention-based architectures have become ubiquitous in time series forecasting tasks, including spatio-temporal (STF) and long-term time series forecasting (LTSF). Yet, our understanding of the reasons for their effectiveness remains limited. In this work, we propose a novel pruning strategy, $R$ eplace $A$ ttention with $M$ LP (RAM), that approximates the attention mechanism using only feedforward layers, residual connections, and layer normalization for temporal and/or spatial modeling in multivariate time series forecasting. Specifically, the Q, K, and V projections, the attention score calculation, the dot-product between the attention score and the V, and the final projection can be removed from the attention-based networks without significantly degrading the performance, so that the given network remains the top-tier compared to other SOTA methods. RAM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting

MethodsSoftmax · Attention Is All You Need · Pruning · Layer Normalization