Rolling-Origin Validation Reverses Model Rankings in Multi-Step PM10 Forecasting: XGBoost, SARIMA, and Persistence
Federico Garcia Crespi, Eduardo Yubero Funes, Marina Alfosea Simon

TL;DR
This study shows that the evaluation method significantly impacts model ranking in PM10 forecasting, with rolling-origin validation revealing different results than static splits, emphasizing the importance of realistic assessment protocols.
Contribution
It demonstrates that rolling-origin validation can reverse model rankings compared to static evaluation, highlighting the need for operationally relevant testing in air quality forecasting.
Findings
Rolling-origin validation reverses model rankings compared to static evaluation.
SARIMA remains positively skilled across all horizons under rolling-origin evaluation.
XGBoost's performance is not consistently better than persistence at various horizons.
Abstract
(a) Many air quality forecasting studies report gains from machine learning, but evaluations often use static chronological splits and omit persistence baselines, so the operational added value under routine updating is unclear. (b) Using 2,350 daily PM10 observations from 2017 to 2024 at an urban background monitoring station in southern Europe, we compare XGBoost and SARIMA against persistence under a static split and a rolling-origin protocol with monthly updates. We report horizon-specific skill and the predictability horizon, defined as the maximum horizon with positive persistence-relative skill. Static evaluation suggests XGBoost performs well from one to seven days ahead, but rolling-origin evaluation reverses rankings: XGBoost is not consistently better than persistence at short and intermediate horizons, whereas SARIMA remains positively skilled across the full range. (c)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Quality Monitoring and Forecasting · Air Quality and Health Impacts · Atmospheric chemistry and aerosols
