A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie; Nam H. Nguyen; Phanwadee Sinthong; Jayant Kalagnanam

arXiv:2211.14730·cs.LG·March 7, 2023·538 cites

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers

Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam

PDF

Open Access 5 Repos 5 Models 1 Video

TL;DR

This paper introduces PatchTST, a Transformer-based model for multivariate time series forecasting that uses patching and channel-independence to improve accuracy, efficiency, and transfer learning capabilities.

Contribution

The paper proposes a novel PatchTST model with patching and channel-independence, enhancing long-term forecasting and self-supervised learning in time series analysis.

Findings

01

Significant improvement over SOTA Transformer models in long-term forecasting accuracy.

02

Effective self-supervised pre-training that outperforms supervised training on large datasets.

03

Successful transfer of pre-trained representations across different datasets.

Abstract

We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

A Time Series is Worth 64 Words: Long-term Forecasting with Transformers· slideslive

Taxonomy

TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Anomaly Detection Techniques and Applications

MethodsAttention Is All You Need · Layer Normalization · Softmax · Adam · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Linear Layer