A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, Jayant Kalagnanam

TL;DR
This paper introduces PatchTST, a Transformer-based model for multivariate time series forecasting that uses patching and channel-independence to improve accuracy, efficiency, and transfer learning capabilities.
Contribution
The paper proposes a novel PatchTST model with patching and channel-independence, enhancing long-term forecasting and self-supervised learning in time series analysis.
Findings
Significant improvement over SOTA Transformer models in long-term forecasting accuracy.
Effective self-supervised pre-training that outperforms supervised training on large datasets.
Successful transfer of pre-trained representations across different datasets.
Abstract
We propose an efficient design of Transformer-based models for multivariate time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer; (ii) channel-independence where each channel contains a single univariate time series that shares the same embedding and Transformer weights across all the series. Patching design naturally has three-fold benefit: local semantic information is retained in the embedding; computation and memory usage of the attention maps are quadratically reduced given the same look-back window; and the model can attend longer history. Our channel-independent patch time series Transformer (PatchTST) can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ibm-research/patchtst-fm-r1model· 27k dl· ♡ 827k dl♡ 8
- 🤗ibm-research/patchtst-etth1-pretrainmodel· 496 dl· ♡ 2496 dl♡ 2
- 🤗ibm-granite/granite-timeseries-patchtstmodel· 5.3k dl· ♡ 195.3k dl♡ 19
- 🤗chungimungi/PatchTST-2-input-channelsmodel· 4 dl4 dl
- 🤗ibm-granite/granite-timeseries-patchtst-fm-r1model· 32 dl32 dl
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Anomaly Detection Techniques and Applications
MethodsAttention Is All You Need · Layer Normalization · Softmax · Adam · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Linear Layer
