Comparative Analysis of Machine Learning-Based Imputation Techniques for Air Quality Datasets with High Missing Data Rates
Sen Yan, David J. O'Connor, Xiaojun Wang, Noel E. O'Connor, Alan F., Smeaton, Mingming Liu

TL;DR
This paper compares various machine learning imputation techniques for high-missing-data air quality datasets, demonstrating that diffusion and ensemble methods can achieve high accuracy despite 82.42% missing data.
Contribution
It provides a comprehensive evaluation of imputation methods for high missing data rates in spatiotemporal air quality datasets, highlighting the effectiveness of diffusion and ensemble models.
Findings
Diffusion methods with external features achieved F1 score of 0.9486.
Ensemble models achieved up to 94.82% accuracy.
Good performance is possible despite high missing data rates.
Abstract
Urban pollution poses serious health risks, particularly in relation to traffic-related air pollution, which remains a major concern in many cities. Vehicle emissions contribute to respiratory and cardiovascular issues, especially for vulnerable and exposed road users like pedestrians and cyclists. Therefore, accurate air quality monitoring with high spatial resolution is vital for good urban environmental management. This study aims to provide insights for processing spatiotemporal datasets with high missing data rates. In this study, the challenge of high missing data rates is a result of the limited data available and the fine granularity required for precise classification of PM2.5 levels. The data used for analysis and imputation were collected from both mobile sensors and fixed stations by Dynamic Parcel Distribution, the Environmental Protection Agency, and Google in Dublin,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAir Quality Monitoring and Forecasting · Air Quality and Health Impacts
MethodsDiffusion
