# Ensemble machine learning strategies for mineral prospectivity mapping under data scarcity

**Authors:** Poorya Amirajlo, Hossein Hassani, Amin Beiranvand Pour, Narges Habibkhah

PMC · DOI: 10.1038/s41598-026-40125-1 · Scientific Reports · 2026-02-15

## TL;DR

This paper introduces a reliable machine learning framework for mineral exploration when data is limited, emphasizing stability and reproducibility over accuracy.

## Contribution

A reliability-centered framework for mineral prospectivity mapping under data scarcity, prioritizing calibration and reproducibility.

## Key findings

- Grid Search delivers more stable and well-calibrated probabilities under extreme data scarcity.
- SVM + GNB ensemble with Grid Search achieved an AUC of 0.90 with consistent calibration.
- Bayesian Optimization yielded slightly higher AUC (0.95) but less consistent reliability.

## Abstract

In mineral prospectivity mapping (MPM), the scarcity of labeled data and severe class imbalance often undermine the stability and reliability of machine learning models. This study advances a reliability-centered framework that prioritizes calibration and reproducibility over marginal accuracy gains when training data are limited. Two ensemble configurations, Light Gradient Boosting Machine combined with AdaBoost, and Support Vector Machine combined with Gaussian Naive Bayes, were systematically evaluated using three hyperparameter optimization strategies: Grid Search, Random Search, and Bayesian Optimization. The Synthetic Minority Oversampling Technique (SMOTE) and five-fold cross-validation were applied to counteract data imbalance and improve model robustness. Results from the Dehaq Pb–Zn district, within the Sanandaj–Sirjan Zone of Iran, reveal that while Bayesian Optimization can yield slightly higher Receiver Operating Characteristic (ROC)-Area Under the Curve (AUC) scores, Grid Search consistently delivers more stable, well-calibrated probabilities under extreme data scarcity. The SVM + GNB ensemble tuned via Grid Search demonstrated superior balance between discrimination and reliability, achieving an AUC of 0.90 with the most consistent calibration curve, whereas the Bayesian Optimization configuration marginally reached the highest AUC (0.95). These findings highlight that stability and probability calibration are decisive for trustworthy mineral prospectivity modeling in data-scarce environments. The proposed framework provides a reproducible and interpretable pathway for translating machine learning predictions into reliable exploration decisions, supporting risk-aware targeting in early-stage mineral exploration.

## Full-text entities

- **Diseases:** MPM (MESH:C535477), fractures (MESH:D050723)
- **Chemicals:** Ti (MESH:D014025), Cu (MESH:D003300), goethite (MESH:C094886), carbonate (MESH:D002254), oxide (MESH:D010087), Lead (MESH:D007854), hematite (MESH:C000499), Fe (MESH:D007501), limonite (MESH:C021024), As (MESH:D001151), limestone (MESH:D002119), Zn (MESH:D015032), Au (MESH:D006046), Co (MESH:D003035), Cr (MESH:D002857), Mn (MESH:D008345), Mo (MESH:D008982), MVT (-), Cd (MESH:D002104), T (MESH:D014316), Hg (MESH:D008628)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12996581/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12996581/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12996581/full.md

---
Source: https://tomesphere.com/paper/PMC12996581