# Integrating optimization and machine learning for estimating water resistivity and saturation in shaley sand reservoirs

**Authors:** Muhammad A. El Hameedy, Walid M. Mabrouk, Ahmed M. Metwally

PMC · DOI: 10.1038/s41598-026-36133-w · Scientific Reports · 2026-02-11

## TL;DR

This paper presents a new method combining optimization and machine learning to improve water resistivity and saturation estimates in complex oil reservoirs.

## Contribution

A novel framework integrating numerical optimization and ML for reliable petrophysical evaluation in shaley sand reservoirs.

## Key findings

- Powell and Nelder-Mead optimization algorithms accurately estimated water resistivity with low error.
- LSTM, CatBoost, and XGBoost ML models achieved high accuracy (R2 up to 0.944) in predicting water saturation.
- The integrated framework reduces reliance on core analyses and improves hydrocarbon estimation accuracy.

## Abstract

Accurate characterization of shaley-sand reservoirs remains a significant challenge in petroleum geophysics, where complex clay mineralogy often renders traditional evaluation methods unreliable. This study introduces an integrated, data-driven framework that synergizes numerical optimization and machine learning (ML) to accurately estimate formation water resistivity (Rw) and predict water saturation (Sw), overcoming the limitations of data scarcity. The workflow begins with rigorous preprocessing of well log data from 11 wells across the Norwegian North Sea and Egyptian Western Desert. First, we establish a robust, physically-constrained Rw by evaluating four optimization algorithms. The Powell and Nelder-Mead algorithms emerged as superior, demonstrating the ability to recover the true Rw from log data with low error (1×10-4 RMSE) against measured samples rapidly. This optimized Rw then serves as a high-quality "pseudo-core" label to generate a continuous Sw log for training a comprehensive suite of ML models, including ensemble methods (Random Forest, CatBoost, XGBoost) and neural networks (ANNs, LSTM), to predict Sw. The models demonstrated predictive accuracy, validated by a robust 5-fold cross-validation protocol. On the blind test wells, the top-performing models (LSTM, CatBoost , and XGBoost) achieved a coefficient of determination (R2) up to 0.944 with Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as low as 0.03 and 0.050 respectively. The automated fusion of optimization-derived physics with ML-driven prediction marks a transformative step toward more reliable, data-centric petrophysical workflows. This integrated framework offers a significant enhancement in reservoir characterization, providing a cost-effective and scalable methodology that reduces reliance on expensive core analyses and improves the accuracy of hydrocarbon-in-place estimations.

The online version contains supplementary material available at 10.1038/s41598-026-36133-w.

## Full-text entities

- **Chemicals:** hydrocarbon (MESH:D006838), water (MESH:D014867)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12905170/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12905170/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12905170/full.md

---
Source: https://tomesphere.com/paper/PMC12905170