A Simple and Robust Multi-Fidelity Data Fusion Method for Effective Modeling of Citizen-Science Air Pollution Data
Camilla Andreozzi, Pietro Colombo, Philipp Otto

TL;DR
This paper introduces a robust multi-fidelity Gaussian process method that effectively combines high-quality reference data with noisy citizen-science sensors for air pollution modeling, improving accuracy and stability.
Contribution
It proposes a novel robust Gaussian process approach using a Huber loss for multi-fidelity data fusion, enhancing robustness against contamination while maintaining flexibility.
Findings
Robust estimator maintains stable MAE and RMSE under contamination.
Method improves predictive accuracy in empirical PM2.5 data analysis.
Framework is scalable and reproducible with open-source code.
Abstract
We propose a robust multi-fidelity Gaussian process for integrating sparse, high-quality reference monitors with dense but noisy citizen-science sensors. The approach replaces the Gaussian log-likelihood in the high-fidelity channel with a global Huber loss applied to precision-weighted residuals, yielding bounded influence on all parameters, including the cross-fidelity coupling, while retaining the flexibility of co-kriging. We establish attenuation and unbounded influence of the Gaussian maximum likelihood estimator under low-fidelity contamination and derive explicit finite bounds for the proposed estimator that clarify how whitening and mean-shift sensitivity determine robustness. Monte Carlo experiments with controlled contamination show that the robust estimator maintains stable MAE and RMSE as anomaly magnitude and frequency increase, whereas the Gaussian MLE deteriorates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Quality Monitoring and Forecasting · Soil Geostatistics and Mapping · Air Quality and Health Impacts
