Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model
Xue Wang, Tian Zhou, Jinyang Gao, Bolin Ding, Jingren Zhou

TL;DR
This paper introduces YingLong, a non-causal transformer model for time series forecasting that benefits from longer output sequences due to a delayed chain-of-thought effect, achieving state-of-the-art results across multiple datasets.
Contribution
The paper presents YingLong, a novel non-causal, bidirectional transformer for time series forecasting, demonstrating a new scaling effect where longer outputs improve accuracy.
Findings
YingLong achieves over 60% best performance in benchmarks.
The model outperforms existing foundation models by 14% and 44%.
Longer output sequences significantly enhance model accuracy.
Abstract
We present a joint forecasting framework for time series prediction that contrasts with traditional direct or recursive methods. This framework achieves state-of-the-art performance for our designed foundation model, YingLong, and reveals a novel scaling effect: longer outputs significantly enhance model accuracy due to delayed chain-of-thought reasoning in our non-causal approach. YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery, aligning more effectively with language understanding tasks than with generation tasks. Additionally, we boost performance by tackling output variance with a multi-input ensemble. We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks on the ETT and Weather datasets. YingLong achieves more than 60% best performance. To ensure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Complex Systems and Time Series Analysis
