# Human and environmental controls on soil contamination in a dust-prone region revealed by random forest and Shapley additive explanations analysis

**Authors:** Zohre Ebrahimi-Khusfi, Shamsollah Ayoubi, Seyed Arman Samadi-Todar, Narjes Okati

PMC · DOI: 10.1038/s41598-026-40377-x · Scientific Reports · 2026-02-21

## TL;DR

This study uses machine learning to predict soil contamination by toxic elements in a dust-prone region of Iran and identifies key human and environmental factors.

## Contribution

The study introduces a novel combination of human activity and soil physicochemical factors for predicting soil contamination using random forest and SHAP analysis.

## Key findings

- Scenario VI (HAF + PSP) best predicted arsenic, cobalt, and chromium with R2 values of 0.59, 0.60, and 0.58.
- Scenario X (PSP + LSF + HAF + RSAD) best predicted cadmium with an R2 of 0.67.
- Human activity and soil properties were the most influential factors for predicting most PTEs.

## Abstract

Accurate prediction of the spatial distribution of potentially toxic elements (PTEs) and identification of the most important environmental drivers are essential for reducing their adverse effects on human health and the environment. In this regard, the present study was conducted to predict the spatial distribution of arsenic (As), cadmium (Cd), cobalt (Co), chromium (Cr), and lead (Pb) in a dust-prone area of central Iran using the RF model under 11 scenarios constructed based on human activity-based factors (HAF), land-based factors (LSF), physicochemical soil properties) PSP), meteorological factors (MF), and remote sensing auxiliary data (RSAD). The overall contribution of the influencing factors in predicting soil PTEs was determined using the SHapley additive exPlanations (SHAP) analysis. Soil PTEs and some other properties of 107 surface soil samples were measured in the laboratory. The best performance of RF in predicting As, Co, and Cr was observed under scenario VI (HAF + PSP) with the R2 value of 0.59, 0.60, and 0.58, respectively. The RF under Scenario X (PSP + LSF+HAF+RSAD) showed the best performance in predicting Cd (R2 = 0.67). The performance of RF for predicting Pb was weak in all scenarios (R2 < 0.38). The contributions of HAF, LSF, PSP, and RSAD in predicting Cd were 54.9, 21.5, 18.3, and 5.2%, respectively. On average, the contributions of HAF and PSP to the prediction of the other three PTEs were 55.6 and 44.4%, respectively. Among these categories, distance to industries, calcium, magnesium, magnetic susceptibility, terrain ruggedness index, and distance to rivers were identified as the most important predictors. Our findings are useful for improving soil management to reduce the adverse effects of PTEs in arid environments.

The online version contains supplementary material available at 10.1038/s41598-026-40377-x.

## Linked entities

- **Chemicals:** arsenic (PubChem CID 5359596), cadmium (PubChem CID 23973), cobalt (PubChem CID 104730), chromium (PubChem CID 23976), lead (PubChem CID 5352425)

## Full-text entities

- **Diseases:** Cd (MESH:D002105), HAF (MESH:D019292), PSP (MESH:D011030), PTEs (MESH:C537245)
- **Chemicals:** PTEs (-), Cr (MESH:D002857), Co (MESH:D003035), magnesium (MESH:D008274), calcium (MESH:D002118), Pb (MESH:D007854), As (MESH:D001151), Cd (MESH:D002104)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13022290/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13022290/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/PMC13022290/full.md

---
Source: https://tomesphere.com/paper/PMC13022290