Enhancing Transformer-based models for Long Sequence Time Series Forecasting via Structured Matrix
Zhicheng Zhang, Yong Wang, Shaoqi Tan, Bowei Xia, Yujie Luo

TL;DR
This paper introduces a novel framework that enhances Transformer models for long sequence time series forecasting by replacing traditional layers with surrogate blocks, significantly improving efficiency and performance.
Contribution
The paper proposes Surrogate Attention Blocks and Surrogate Feed-Forward Blocks to reduce complexity while maintaining model expressiveness in Transformer-based forecasting.
Findings
Average performance improvement of 12.4% across models
Parameter count reduced by 61.3%
Effective on five distinct time series tasks
Abstract
Recently, Transformer-based models for long sequence time series forecasting have demonstrated promising results. The self-attention mechanism as the core component of these Transformer-based models exhibits great potential in capturing various dependencies among data points. Despite these advancements, it has been a subject of concern to improve the efficiency of the self-attention mechanism. Unfortunately, current specific optimization methods are facing the challenges in applicability and scalability for the future design of long sequence time series forecasting models. Hence, in this article, we propose a novel architectural framework that enhances Transformer-based models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB). The framework reduces both time and space complexity by the replacement of the self-attention and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Multi-Head Attention · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding
