# Integrated machine learning and positive matrix factorization for the source-specific contamination and predictive risk assessment of potentially toxic elements in multi-land-use soils around an active coal mine

**Authors:** Zahid Bashir, Deep Raj, Rangabhashiyam Selvasembian

PMC · DOI: 10.1039/d5ra09789d · RSC Advances · 2026-02-19

## TL;DR

This study uses machine learning and geospatial tools to assess soil contamination near a coal mine, identifying sources and risks to human health and the environment.

## Contribution

An integrated framework combining PMF, machine learning, and geospatial analysis for source-specific contamination and predictive risk assessment of PTEs in mining-impacted soils.

## Key findings

- Mixed industrial-mining activities were identified as the dominant source of contamination (∼49%).
- Random forest model achieved strong predictive performance (R² = 0.82) for PTE concentrations.
- Cd and Hg posed high ecological risks, while Cr and Co were key contributors to carcinogenic and non-carcinogenic risks in children.

## Abstract

Investigating the distribution, sources, and risks of potentially toxic elements (PTEs) in mining-impacted soils is critical for effective environmental monitoring and human health protection. However, traditional assessments often fail to integrate spatial, source-oriented, and predictive approaches limiting a comprehensive understanding. In this study, 120 soil samples were collected from five land-use types surrounding an active opencast coal mine in the Godavari Valley coalfields, India. Pollution indices revealed severe multi-metal contamination, with Co and Cd emerging as the most consistently enriched elements across land uses, while Zn showed pronounced but spatially restricted enrichment, particularly in coal mine soils. An integrated framework combining positive matrix factorization (PMF), machine learning, and geospatial analysis was developed to identify source-specific contamination patterns. A robust four-factor PMF solution identified mixed industrial-mining activities as the dominant source (∼49%) of contamination. A random forest (RF) model integrating soil properties, spatial variables, and PMF-derived source contributions demonstrated strong to moderate predictive performance (average R2 = 0.82) with an average root mean square error (RMSE) of 19.6 mg kg−1. Geostatistical mapping highlighted coal mines and adjacent agricultural areas as persistent contamination hotspots. Ecological risk assessment indicated Cd and Hg as the principal contributors to high ecological risks, particularly in agricultural and roadside soils. Probabilistic health risk assessment revealed unacceptable risks for the local population, with children being the most vulnerable. Cr was identified as the primary driver of carcinogenic risk, contributing ∼81% in children, while Co-dominated non-carcinogenic risks resulting in hazard indices for children approaching unacceptable thresholds across all land-uses. Our findings provide a precise and scientific framework for source-specific risk assessment to target soil remediation and environmental management in mining-impacted landscapes worldwide.

Using geospatial analysis, receptor modelling, and machine learning, this study evaluates potentially toxic element contamination across land uses near an active coal mine, linking distribution, sources, and risk assessment.

## Linked entities

- **Chemicals:** Co (PubChem CID 281), Cd (PubChem CID 23973), Zn (PubChem CID 23994), Hg (PubChem CID 23931), Cr (PubChem CID 23976)

## Full-text entities

- **Genes:** PRB1 (proline rich protein BstNI subfamily 1) [NCBI Gene 5542] {aka PM, PMF, PMS, PRB1L, PRB1M}, TRBV20OR9-2 (T cell receptor beta variable 20/OR9-2 (non-functional)) [NCBI Gene 6962] {aka CDR3, TCRBV20S2, TCRBV2O, TCRBV2S2O}
- **Diseases:** PTEs (MESH:C537245), NCR (MESH:C580335), respiratory malignancies (MESH:D012131), cardiovascular and renal toxicity (MESH:D007674), CF (MESH:D005171), renal, pulmonary, and cardiovascular disorders (MESH:D002318), cancer (MESH:D009369), CM (MESH:D055008), toxicity (MESH:D064420), HI (MESH:C566784), metal (MESH:D013651), CR (MESH:D011230)
- **Chemicals:** Pb (MESH:D007854), iron oxides (MESH:C000499), As (MESH:D001151), HCl (MESH:D006851), Cd (MESH:D002104), Cu (MESH:D003300), HNO3 (MESH:D017942), lime (MESH:C016538), water (MESH:D014867), Co (MESH:D003035), ammonium acetate (MESH:C018824), Ni (MESH:D009532), Cr (MESH:D002857), N% (MESH:D009584), pyrite (MESH:C011342), C% (MESH:D002244), H2O2 (MESH:D006861), metal (MESH:D008670), CM (-), phosphate (MESH:D010710), Hg (MESH:D008628), Zn (MESH:D015032), sulphide (MESH:D013440)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** MESS-4 — Homo sapiens (Human), Ataxia telangiectasia syndrome, Finite cell line (CVCL_F083)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12919393/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12919393/full.md

## References

94 references — full list in the complete paper: https://tomesphere.com/paper/PMC12919393/full.md

---
Source: https://tomesphere.com/paper/PMC12919393