# Machine learning driven identification of optimal nanomaterials for efficient pararosaniline dye removal from water using a RFHGB hybrid model

**Authors:** Ganesan Anandhi, M. Iyapparaja

PMC · DOI: 10.1039/d5ra09598k · RSC Advances · 2026-03-04

## TL;DR

A machine learning model predicts the best nanomaterials for removing a toxic dye from water, identifying a ZnO–CuO composite as the most effective.

## Contribution

A novel RFHGB hybrid machine learning model with synthetic data augmentation is introduced for predicting photocatalytic dye degradation.

## Key findings

- The RFHGB-hybrid model achieved the highest accuracy in predicting pararosaniline dye degradation.
- ZnO–CuO nanocomposite was identified as the most efficient photocatalyst for PRS removal.
- Synthetic data augmentation improved model training with 5000 data points from 81 experiments.

## Abstract

Water pollution by emerging contaminants requires advanced treatment technologies aside from conventional approaches due to the particular threat they pose to environmental and public health. Pararosaniline dye pollutant (PRS) is generally used in textile and biological staining applications, which may result in strong chemical stability, low biodegradability, and high toxicity, making its complete removal from wastewater so difficult. In this study, a ZnO–CuO nanocomposite and SrO photocatalysts were synthesized by experimental means and evaluated for photocatalytic degradation of PRS under controlled conditions. A dataset consisting of 81 experimental observations was computationally expanded to 5000 using synthetic data augmentation. Fifteen machine learning algorithms were trained to predict degradation efficiency, and the top five models were identified based on their performance metrics. Pairwise hybridization of the best five models produced ten hybrid combinations, out of which the Random Forest + HistGradient Boosting hybrid model (RFHGB-hybrid model) demonstrated the highest accuracy and lowest prediction error. The model also provided optimal degradation conditions and catalyst ranking, finding ZnO–CuO to be the best-performing photocatalyst.

An RFHGB machine learning model integrated with synthetic data augmentation accurately predicts photocatalytic degradation of pararosaniline. It also identifies ZnO–CuO as the most efficient catalyst.

## Linked entities

- **Chemicals:** pararosaniline (PubChem CID 11293)

## Full-text entities

- **Genes:** SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}
- **Diseases:** bacterial infections (MESH:D001424), toxicity (MESH:D064420), haematological disorders (MESH:D006402), carcinogenic (MESH:D011230), respiratory distress (MESH:D012128), skin irritation (MESH:D012871)
- **Chemicals:** reactive oxygen species (MESH:D017382), biochar (MESH:C540010), carbon dioxide (MESH:D002245), pararosaniline (MESH:C005409), malachite green (MESH:C005095), triphenylmethane (MESH:C046945), Metal oxide (-), superoxide (MESH:D013481), crystal violet (MESH:D005840), CuO (MESH:C030973), hydroxyl radicals (MESH:D017665), ZnO (MESH:D015034), Water (MESH:D014867), methylene blue (MESH:D008751), carbon (MESH:D002244), PRS (MESH:D011221)
- **Species:** Homo sapiens (human, species) [taxon 9606], activated sludge metagenome (species) [taxon 942017]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12959327/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12959327/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12959327/full.md

---
Source: https://tomesphere.com/paper/PMC12959327