TL;DR
AirQualityBench is a comprehensive benchmark for evaluating air quality forecasting models under real-world conditions, emphasizing missing data, heterogeneous scales, and global coverage.
Contribution
The paper introduces a realistic, global multi-pollutant benchmark that preserves native observation masks and evaluates models in conditions mimicking actual monitoring networks.
Findings
Strong models on sanitized data do not transfer well to real-world fragmented streams.
Benchmark data and code are publicly available for reproducibility.
Evaluating models under realistic missingness reveals their true robustness.
Abstract
Air-quality forecasting models are commonly evaluated on regional, preprocessed, and normalized datasets, where missing observations are removed or artificially completed. Such protocols simplify comparison but hide the conditions that dominate real monitoring networks: uneven global coverage, structured missingness, heterogeneous pollutant scales, and deployment cost. We introduce \textbf{AirQualityBench}, a global multi-pollutant benchmark designed to evaluate forecasting models under these realistic conditions. The benchmark contains hourly observations from 3,720 monitoring stations over 2021--2025, covers six major pollutants, and preserves provider-native observation masks. Rather than imputing a dense data tensor, AirQualityBench exposes missingness as part of the forecasting problem and reports errors on valid future observations after inverse transformation to physical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
