Enhancing Transformer-based models for Long Sequence Time Series   Forecasting via Structured Matrix

Zhicheng Zhang; Yong Wang; Shaoqi Tan; Bowei Xia; Yujie Luo

arXiv:2405.12462·cs.LG·December 17, 2024

Enhancing Transformer-based models for Long Sequence Time Series Forecasting via Structured Matrix

Zhicheng Zhang, Yong Wang, Shaoqi Tan, Bowei Xia, Yujie Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework that enhances Transformer models for long sequence time series forecasting by replacing traditional layers with surrogate blocks, significantly improving efficiency and performance.

Contribution

The paper proposes Surrogate Attention Blocks and Surrogate Feed-Forward Blocks to reduce complexity while maintaining model expressiveness in Transformer-based forecasting.

Findings

01

Average performance improvement of 12.4% across models

02

Parameter count reduced by 61.3%

03

Effective on five distinct time series tasks

Abstract

Recently, Transformer-based models for long sequence time series forecasting have demonstrated promising results. The self-attention mechanism as the core component of these Transformer-based models exhibits great potential in capturing various dependencies among data points. Despite these advancements, it has been a subject of concern to improve the efficiency of the self-attention mechanism. Unfortunately, current specific optimization methods are facing the challenges in applicability and scalability for the future design of long sequence time series forecasting models. Hence, in this article, we propose a novel architectural framework that enhances Transformer-based models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB). The framework reduces both time and space complexity by the replacement of the self-attention and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

newbeezzc/MonarchAttn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Neural Networks and Applications

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Multi-Head Attention · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing · Byte Pair Encoding