Does Scaling Law Apply in Time Series Forecasting?
Zeyan Li, Libing Chen, Yin Tang

TL;DR
This paper introduces Alinear, a lightweight time series forecasting model that challenges the necessity of large models by achieving competitive accuracy with significantly fewer parameters through adaptive decomposition and frequency attenuation.
Contribution
We propose Alinear, an ultra-lightweight forecasting model with adaptive mechanisms, demonstrating that smaller models can outperform larger ones in time series forecasting tasks.
Findings
Alinear outperforms larger models on seven benchmark datasets.
It maintains accuracy across various forecasting horizons.
It uses less than 1% of the parameters of large models.
Abstract
Rapid expansion of model size has emerged as a key challenge in time series forecasting. From early Transformer with tens of megabytes to recent architectures like TimesNet with thousands of megabytes, performance gains have often come at the cost of exponentially increasing parameter counts. But is this scaling truly necessary? To question the applicability of the scaling law in time series forecasting, we propose Alinear, an ultra-lightweight forecasting model that achieves competitive performance using only k-level parameters. We introduce a horizon-aware adaptive decomposition mechanism that dynamically rebalances component emphasis across different forecast lengths, alongside a progressive frequency attenuation strategy that achieves stable prediction in various forecasting horizons without incurring the computational overhead of attention mechanisms. Extensive experiments on seven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Stock Market Forecasting Methods
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Softmax · Absolute Position Encodings · Residual Connection
