Unlocking the Power of Patch: Patch-Based MLP for Long-Term Time Series Forecasting
Peiwang Tang, Weitai Zhang

TL;DR
This paper introduces PatchMLP, a simple yet effective patch-based MLP model that outperforms complex Transformer models in long-term time series forecasting by emphasizing cross-variable interactions and using moving averages.
Contribution
The paper proposes a novel PatchMLP model that leverages patch mechanisms and channel interactions, challenging the dominance of Transformer-based models in LTSF tasks.
Findings
PatchMLP achieves state-of-the-art results on multiple datasets.
Simple linear layers with patch mechanisms outperform complex Transformers.
Cross-variable interactions significantly improve forecasting accuracy.
Abstract
Recent studies have attempted to refine the Transformer architecture to demonstrate its effectiveness in Long-Term Time Series Forecasting (LTSF) tasks. Despite surpassing many linear forecasting models with ever-improving performance, we remain skeptical of Transformers as a solution for LTSF. We attribute the effectiveness of these models largely to the adopted Patch mechanism, which enhances sequence locality to an extent yet fails to fully address the loss of temporal information inherent to the permutation-invariant self-attention mechanism. Further investigation suggests that simple linear layers augmented with the Patch mechanism may outperform complex Transformer-based LTSF models. Moreover, diverging from models that use channel independence, our research underscores the importance of cross-variable interactions in enhancing the performance of multivariate time series…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Stock Market Forecasting Methods
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Label Smoothing · Adam · Absolute Position Encodings · Dropout
