SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models
Suhan Guo, Jiahong Deng, Mengjun Yi, Furao Shen, Jian Zhao

TL;DR
SPAT introduces a sensitivity-based structured pruning method that removes entire attention modules in time series forecasting models, reducing computational costs and improving efficiency without hardware demands.
Contribution
The paper presents a novel dynamic sensitivity metric, SEND, for selectively pruning attention modules, leading to more efficient models that outperform existing lightweight methods.
Findings
Achieved 2.842% reduction in MSE and 1.996% in MAE.
Reduced FLOPs by 35.274%.
Outperformed state-of-the-art methods in standard and zero-shot inference.
Abstract
Attention-based architectures have achieved superior performance in multivariate time series forecasting but are computationally expensive. Techniques such as patching and adaptive masking have been developed to reduce their sizes and latencies. In this work, we propose a structured pruning method, SPAT (ensitivity runer for tention), which selectively removes redundant attention mechanisms and yields highly effective models. Different from previous approaches, SPAT aims to remove the entire attention module, which reduces the risk of overfitting and enables speed-up without demanding specialized hardware. We propose a dynamic sensitivity metric, ensitivity nhanced ormalized ispersion (SEND) that measures the importance of each attention module during the pre-training phase. Experiments on multivariate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · L1 Regularization · Activation Patching · Adaptive Masking · Masked autoencoder · Pruning
