Enhancing the Locality and Breaking the Memory Bottleneck of Transformer   on Time Series Forecasting

Shiyang Li; Xiaoyong Jin; Yao Xuan; Xiyou Zhou; Wenhu Chen; Yu-Xiang; Wang; Xifeng Yan

arXiv:1907.00235·cs.LG·January 6, 2020·1.0k cites

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang, Wang, Xifeng Yan

PDF

Open Access 2 Repos

TL;DR

This paper introduces a convolutional self-attention mechanism and a LogSparse Transformer model that enhance local context understanding and reduce memory usage, significantly improving long-term time series forecasting accuracy.

Contribution

It proposes convolutional self-attention to incorporate local context and LogSparse Transformer to address memory bottlenecks in long sequence modeling.

Findings

01

Outperforms state-of-the-art methods on real-world datasets

02

Reduces memory complexity to O(L(log L)^2)

03

Improves forecasting accuracy for long sequences

Abstract

Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer [1]. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) locality-agnostics: the point-wise dot-product self-attention in canonical Transformer architecture is insensitive to local context, which can make the model prone to anomalies in time series; (2) memory bottleneck: space complexity of canonical Transformer grows quadratically with sequence length $L$ , making directly modeling long time series infeasible. In order to solve these two issues, we first propose convolutional self-attention by producing queries and keys with causal convolution so that local context can be better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Advanced Text Analysis Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax