LATST: Are Transformers Necessarily Complex for Time-Series Forecasting

Dizhen Liang

arXiv:2410.23749·cs.LG·July 9, 2025

LATST: Are Transformers Necessarily Complex for Time-Series Forecasting

Dizhen Liang

PDF

Open Access

TL;DR

LATST is a new Transformer-based method for multivariate time series forecasting that addresses training challenges and outperforms existing models, often with fewer parameters, challenging the notion that Transformers must be complex.

Contribution

Introduces LATST, a Transformer variant that mitigates entropy collapse and training instability, achieving superior performance with fewer parameters in time series forecasting.

Findings

01

LATST outperforms existing Transformer models on multiple datasets.

02

LATST achieves competitive results with fewer parameters than some linear models.

03

The approach effectively mitigates training instability in Transformer-based forecasting.

Abstract

Transformer-based architectures have achieved remarkable success in natural language processing and computer vision. However, their performance in multivariate long-term forecasting often falls short compared to simpler linear baselines. Previous research has identified the traditional attention mechanism as a key factor limiting their effectiveness in this domain. To bridge this gap, we introduce LATST, a novel approach designed to mitigate entropy collapse and training instability common challenges in Transformer-based time series forecasting. We rigorously evaluate LATST across multiple real-world multivariate time series datasets, demonstrating its ability to outperform existing state-of-the-art Transformer models. Notably, LATST manages to achieve competitive performance with fewer parameters than some linear models on certain datasets, highlighting its efficiency and effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Softmax · Adam · Residual Connection