QuitoBench: A High-Quality Open Time Series Forecasting Benchmark

Siqiao Xue; Zhaoyang Zhu; Wei Zhang; Rongyao Cai; Rui Wang; Yixiang Mu; Fan Zhou; Jianguo Li; Peng Di; Hang Yu

arXiv:2603.26017·cs.LG·March 30, 2026

QuitoBench: A High-Quality Open Time Series Forecasting Benchmark

Siqiao Xue, Zhaoyang Zhu, Wei Zhang, Rongyao Cai, Rui Wang, Yixiang Mu, Fan Zhou, Jianguo Li, Peng Di, Hang Yu

PDF

3 Datasets

TL;DR

QuitoBench introduces a comprehensive, high-quality benchmark for time series forecasting, enabling better evaluation of models across diverse regimes and revealing insights into model performance and scaling effects.

Contribution

The paper presents QuitoBench, a large-scale, regime-balanced benchmark built on a billion-scale dataset, facilitating more nuanced and reproducible evaluation of forecasting models.

Findings

01

Deep learning models outperform foundation models at short contexts; foundation models excel at long contexts.

02

Forecastability significantly impacts forecasting difficulty, with a 3.64× MAE gap.

03

Deep learning models achieve comparable or better performance with 59× fewer parameters.

Abstract

Time series forecasting is critical across finance, healthcare, and cloud computing, yet progress is constrained by a fundamental bottleneck: the scarcity of large-scale, high-quality benchmarks. To address this gap, we introduce \textsc{QuitoBench}, a regime-balanced benchmark for time series forecasting with coverage across eight trend $\times$ seasonality $\times$ forecastability (TSF) regimes, designed to capture forecasting-relevant properties rather than application-defined domain labels. The benchmark is built upon \textsc{Quito}, a billion-scale time series corpus of application traffic from Alipay spanning nine business domains. Benchmarking 10 models from deep learning, foundation models, and statistical baselines across 232,200 evaluation instances, we report four key findings: (i) a context-length crossover where deep learning models lead at short context ( $L = 96$ ) but foundation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.