Informer: Beyond Efficient Transformer for Long Sequence Time-Series   Forecasting

Haoyi Zhou; Shanghang Zhang; Jieqi Peng; Shuai Zhang; Jianxin Li; Hui; Xiong; Wancai Zhang

arXiv:2012.07436·cs.LG·March 30, 2021·466 cites

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui, Xiong, Wancai Zhang

PDF

Open Access 5 Repos 2 Models 2 Datasets 1 Video

TL;DR

The paper introduces Informer, an efficient Transformer-based model designed for long sequence time-series forecasting, addressing computational challenges and improving prediction accuracy for applications like electricity consumption planning.

Contribution

The paper proposes a novel ProbSparse self-attention mechanism, self-attention distilling, and a generative decoder to enhance long sequence forecasting efficiency and performance.

Findings

01

Informer outperforms existing methods on large-scale datasets.

02

The model achieves O(L log L) complexity, reducing memory and computation.

03

Long sequence predictions are significantly faster with the generative decoder.

Abstract

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a $P r o b S p a r se$ self-attention mechanism, which achieves $O (L lo g L)$ in time complexity and memory usage,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting· underline

Taxonomy

TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Music and Audio Processing

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Multi-Head Attention · Residual Connection · Attention Is All You Need · Byte Pair Encoding · Layer Normalization · Dropout