AirQualityBench: A Realistic Evaluation Benchmark for Global Air Quality Forecasting

Xing Xu; Xu Wang; Yudong Zhang; Huilin Zhao; Zhengyang Zhou; Yang Wang

arXiv:2605.05854·cs.AI·May 8, 2026

AirQualityBench: A Realistic Evaluation Benchmark for Global Air Quality Forecasting

Xing Xu, Xu Wang, Yudong Zhang, Huilin Zhao, Zhengyang Zhou, Yang Wang

PDF

1 Repo

TL;DR

AirQualityBench is a comprehensive benchmark for evaluating air quality forecasting models under real-world conditions, emphasizing missing data, heterogeneous scales, and global coverage.

Contribution

The paper introduces a realistic, global multi-pollutant benchmark that preserves native observation masks and evaluates models in conditions mimicking actual monitoring networks.

Findings

01

Strong models on sanitized data do not transfer well to real-world fragmented streams.

02

Benchmark data and code are publicly available for reproducibility.

03

Evaluating models under realistic missingness reveals their true robustness.

Abstract

Air-quality forecasting models are commonly evaluated on regional, preprocessed, and normalized datasets, where missing observations are removed or artificially completed. Such protocols simplify comparison but hide the conditions that dominate real monitoring networks: uneven global coverage, structured missingness, heterogeneous pollutant scales, and deployment cost. We introduce \textbf{AirQualityBench}, a global multi-pollutant benchmark designed to evaluate forecasting models under these realistic conditions. The benchmark contains hourly observations from 3,720 monitoring stations over 2021--2025, covers six major pollutants, and preserves provider-native observation masks. Rather than imputing a dense data tensor, AirQualityBench exposes missingness as part of the forecasting problem and reports errors on valid future observations after inverse transformation to physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Star-Learning/AirQualityBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.