# A dataset of harmonized global air quality monitoring metadata

**Authors:** Stefania Renna, Carlos Rodriguez-Pardo, Lara Aleluia Reis

PMC · DOI: 10.1038/s41597-026-06797-0 · Scientific Data · 2026-02-17

## TL;DR

This paper creates a global dataset of air quality monitoring stations with detailed metadata to support exposure and health impact studies.

## Contribution

The novel contribution is a harmonized global dataset of air quality station metadata using machine learning classification.

## Key findings

- A machine learning model was developed to classify air quality stations by area and source type.
- The dataset includes ~15,000 monitors from 106 countries with standardized metadata.
- The dataset supports global and regional analyses of air quality exposure and health impacts.

## Abstract

This study addresses the gap in air quality monitoring metadata reporting by building a classifier for air quality station types and area characteristics. It leverages ultra-high-resolution land cover data, complemented by additional demographic and gridded information. We employ advanced machine learning methods, including convolutional neural networks and transformers. Through a custom training approach, we fine-tune pre-trained models on 7000 images and label +8000 additional monitors, resulting in a robust model for classifying air quality stations by area characteristics (urban, rural) and source type (background, non-background). The result is a global harmonized dataset of governmental air quality station metadata for particulate matter, with  ~ 15000 monitors from 106 countries. For each station, the dataset provides an identifier, geographical coordinates, country, area characteristics, source type, and classification status. This dataset enables global feasibility studies and regional analyses of conditions leading to exposure. By providing a consistent classification of monitoring stations, it also allows for meaningful comparisons of sectoral exposure contributions across countries, regions, and station types, supporting comparative studies and health impact assessments.

## Full-text entities

- **Diseases:** deaths (MESH:D003643)
- **Chemicals:** NOx (MESH:D009589), EEA (-), CO (MESH:D002248), NH3 (MESH:D000641), PM (MESH:D011399), SO2 (MESH:D013458)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13022413/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13022413/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/PMC13022413/full.md

---
Source: https://tomesphere.com/paper/PMC13022413