LATST: Are Transformers Necessarily Complex for Time-Series Forecasting
Dizhen Liang

TL;DR
LATST is a new Transformer-based method for multivariate time series forecasting that addresses training challenges and outperforms existing models, often with fewer parameters, challenging the notion that Transformers must be complex.
Contribution
Introduces LATST, a Transformer variant that mitigates entropy collapse and training instability, achieving superior performance with fewer parameters in time series forecasting.
Findings
LATST outperforms existing Transformer models on multiple datasets.
LATST achieves competitive results with fewer parameters than some linear models.
The approach effectively mitigates training instability in Transformer-based forecasting.
Abstract
Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Softmax · Adam · Residual Connection
