# Uncovering Benzene Pollution Patterns Using an Interpretable, Setting-Aware Artificial Intelligence Approach

**Authors:** Ivan Bešlić, Timea Bezdan, Gordana Jovanović, Silvije Davila, Gordana Pehnec, Snježana Herceg Romanić, Andreja Stojić, Mirjana Perišić

PMC · DOI: 10.3390/toxics14020181 · Toxics · 2026-02-18

## TL;DR

This study uses an interpretable AI model to uncover benzene pollution patterns in Zagreb, identifying key environmental settings that influence pollution extremes.

## Contribution

The novel approach combines interpretable AI with environmental setting clustering to identify pollution regimes in urban air quality.

## Key findings

- Seven environmental settings (C0–C6) were identified, governing benzene extremes in Zagreb.
- Setting C6 reflects winter stagnation with combustion influence and weak winds, while C4 reflects synoptic stability and altered anthropogenic activity.
- Low-benzene settings (C0, C1, C3) are linked to stronger mixing and higher oxidizing capacity.

## Abstract

We investigated benzene variability in an urban environment using an interpretable, setting-based artificial intelligence framework. A seven-year dataset (2017–2023) of hourly pollutant concentrations (benzene, NO2, SO2, CO, O3) measured in Zagreb (Croatia) was analyzed, as were meteorological variables. Multiple-ensemble decision tree models were developed, with hyperparameters optimized using metaheuristic algorithms. The best-performing model, Extra Trees optimized by the Sine Cosine Algorithm, achieved an R2 of 0.87. Model interpretation employed Shapley additive explanations (SHAP), followed by PaCMAP embedding and HDBSCAN clustering to identify coherent environmental settings. Seven settings (C0–C6) and one residual group were identified, representing pollution-enhancing, suppressing, and transitional regimes. Two settings dominated benzene extremes. C6 reflected winter stagnation, characterized by strong combustion influence (CO contribution of 11.9%), shallow boundary layers (~290 m), weak winds, and high humidity. C4 represented a synoptic stability regime with enhanced heat fluxes and diminished after the COVID-19 period, consistent with altered anthropogenic activity. Low-benzene settings (C0, C1, C3) were associated with stronger mixing and higher oxidizing capacity, while transitional settings (C2, C5) reflected moderate conditions. Overall, the results show that a small number of environmental settings governed the benzene extremes, providing a transferable and interpretable framework for air quality assessment and policy support.

## Linked entities

- **Chemicals:** benzene (PubChem CID 241), NO2 (PubChem CID 946), SO2 (PubChem CID 1119), CO (PubChem CID 281), O3 (PubChem CID 24823)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** injury to (MESH:D014947), ES (MESH:D012512), COVID (MESH:D000086382), deaths (MESH:D003643), carcinogenic (MESH:D011230)
- **Chemicals:** VOC (MESH:D055549), hydroxyl radicals (MESH:D017665), PAHs (MESH:D011084), C6 (MESH:C117224), water (MESH:D014867), Benzene (MESH:D001554), NO2 (MESH:D009585), CO (MESH:D002248), C4 (MESH:C058899), SO2 (MESH:D013458), nitrogen oxides (MESH:D009589), O3 (MESH:D010126), Co-pollutants (-), sulfur (MESH:D013455)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12944871/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12944871/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12944871/full.md

---
Source: https://tomesphere.com/paper/PMC12944871