Learning Novel Transformer Architecture for Time-series Forecasting
Juyuan Zhang, Wei Zhu, Jiechao Gao

TL;DR
This paper introduces AutoFormer-TS, a neural architecture search framework that designs specialized Transformer architectures for time-series forecasting, leading to improved accuracy over existing models.
Contribution
It presents a novel differentiable neural architecture search method, AB-DARTS, for optimizing Transformer components specifically for TSP tasks.
Findings
AutoFormer-TS outperforms state-of-the-art models in accuracy.
The framework effectively explores diverse attention and activation mechanisms.
It maintains reasonable training efficiency despite architecture complexity.
Abstract
Despite the success of Transformer-based models in the time-series prediction (TSP) tasks, the existing Transformer architecture still face limitations and the literature lacks comprehensive explorations into alternative architectures. To address these challenges, we propose AutoFormer-TS, a novel framework that leverages a comprehensive search space for Transformer architectures tailored to TSP tasks. Our framework introduces a differentiable neural architecture search (DNAS) method, AB-DARTS, which improves upon existing DNAS approaches by enhancing the identification of optimal operations within the architecture. AutoFormer-TS systematically explores alternative attention mechanisms, activation functions, and encoding operations, moving beyond the traditional Transformer design. Extensive experiments demonstrate that AutoFormer-TS consistently outperforms state-of-the-art baselines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Time Series Analysis and Forecasting
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax
