PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting
Zeying Gong, Yujin Tang, Junwei Liang

TL;DR
PatchMixer is a CNN-based architecture that preserves temporal information in long-term time series forecasting, outperforming Transformer-based models in accuracy and speed by using permutation-variant convolutions and dual forecasting heads.
Contribution
It introduces a novel permutation-variant CNN architecture with depthwise separable convolutions and dual heads for improved long-term time series forecasting.
Findings
Achieves 3.9% and 21.2% relative improvements over state-of-the-art methods.
Runs 2-3 times faster than the most advanced models.
Effective on seven benchmark datasets.
Abstract
Although the Transformer has been the dominant architecture for time series forecasting tasks in recent years, a fundamental challenge remains: the permutation-invariant self-attention mechanism within Transformers leads to a loss of temporal information. To tackle these challenges, we propose PatchMixer, a novel CNN-based model. It introduces a permutation-variant convolutional structure to preserve temporal information. Diverging from conventional CNNs in this field, which often employ multiple scales or numerous branches, our method relies exclusively on depthwise separable convolutions. This allows us to extract both local features and global correlations using a single-scale architecture. Furthermore, we employ dual forecasting heads encompassing linear and nonlinear components to better model future curve trends and details. Our experimental results on seven time-series…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Forecasting Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Adam · Residual Connection · Layer Normalization · Softmax
