Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting
Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui, Xiong, Wancai Zhang

TL;DR
The paper introduces Informer, an efficient Transformer-based model designed for long sequence time-series forecasting, addressing computational challenges and improving prediction accuracy for applications like electricity consumption planning.
Contribution
The paper proposes a novel ProbSparse self-attention mechanism, self-attention distilling, and a generative decoder to enhance long sequence forecasting efficiency and performance.
Findings
Informer outperforms existing methods on large-scale datasets.
The model achieves O(L log L) complexity, reducing memory and computation.
Long sequence predictions are significantly faster with the generative decoder.
Abstract
Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a self-attention mechanism, which achieves in time complexity and memory usage,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Music and Audio Processing
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Multi-Head Attention · Residual Connection · Attention Is All You Need · Byte Pair Encoding · Layer Normalization · Dropout
