Smart Ensemble Learning Framework for Predicting Groundwater Heavy Metal Pollution
T. Ansah-Narh, G. Y. Afrifa, J. B. Tandoh, K. Asare, M. Addi, K. E. Yorke, D. M. A. Akpoley, K. Aidoo, S. K. Fosuhene

TL;DR
This study develops a robust ensemble machine learning framework incorporating response transformations and clustering diagnostics to predict groundwater heavy metal pollution more accurately and interpretably.
Contribution
It introduces a distribution-aware ensemble approach with Gaussian copula transformation and clustering diagnostics for improved groundwater pollution prediction.
Findings
Gaussian copula transformation yielded the most reliable predictions.
Copula-based models improved residuals and spatial mapping.
Clustering identified Fe and Mn as key pollution contributors.
Abstract
Groundwater in the Densu Basin is increasingly threatened by heavy metal contamination, but conventional methods fail to capture the statistical complexity and spatial heterogeneity of pollution indicators. A key challenge is modelling the Heavy Metal Pollution Index (HPI), which is typically skewed and affected by correlated contaminants, leading to biased predictions without transformation. This study develops a predictive framework integrating response transformations with nested cross-validated ensemble machine learning. Three transformations (raw, log, and Gaussian copula) were applied to HPI and evaluated across six learners: support vector regression (SVM), -nearest neighbours (k-NN), CART, Elastic Net, kernel ridge regression, and a stacked Lasso ensemble. Raw-scale models produced deceptively high fits (Elastic Net and stacked ensemble ), suggesting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
