Scaling Transformers for Time Series Forecasting: Do Pretrained Large Models Outperform Small-Scale Alternatives?
Sanjay Chakraborty, Ibrahim Delibasoglu, Fredrik Heintz

TL;DR
This paper empirically compares large pre-trained time series models with small-scale transformers, assessing their performance, efficiency, and interpretability across benchmarks to determine when pretraining is beneficial.
Contribution
It provides the first comprehensive evaluation of pre-trained large-scale time series models versus small-scale alternatives, highlighting their relative strengths and limitations.
Findings
Pre-trained models outperform small models in certain forecasting scenarios.
Pretraining improves accuracy but may increase computational costs.
Small models remain competitive in some tasks and are more efficient.
Abstract
Large pre-trained models have demonstrated remarkable capabilities across domains, but their effectiveness in time series forecasting remains understudied. This work empirically examines whether pre-trained large-scale time series models (LSTSMs) trained on diverse datasets can outperform traditional non-pretrained small-scale transformers in forecasting tasks. We analyze state-of-the-art (SOTA) pre-trained universal time series models (e.g., Moirai, TimeGPT) alongside conventional transformers, evaluating accuracy, computational efficiency, and interpretability across multiple benchmarks. Our findings reveal the strengths and limitations of pre-trained LSTSMs, providing insights into their suitability for time series tasks compared to task-specific small-scale architectures. The results highlight scenarios where pretraining offers advantages and where simpler models remain competitive.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
