# Machine learning approaches for data-driven hydrocarbon bioaugmentation and phytoremediation: the role of multi-omics insights

**Authors:** Ugochukwu Chukwuma Okafor, Saeed M. Alghamdi, Lorna Anguilano, Yang Yang

PMC · DOI: 10.3389/fmicb.2026.1742848 · 2026-03-05

## TL;DR

This paper reviews how machine learning and multi-omics data can improve hydrocarbon cleanup using microbes and plants.

## Contribution

The paper introduces novel integration of ML and multi-omics for optimizing bioaugmentation and phytoremediation of PAHs.

## Key findings

- ML models can predict effective microbial strains and plant species for hydrocarbon degradation.
- Multi-omics data combined with ML reveals key genes and metabolic pathways in bioremediation.
- Adaptive ML models and real-time monitoring are needed for large-scale bioremediation success.

## Abstract

Hydrocarbon contamination, particularly with polycyclic aromatic hydrocarbons (PAHs), poses a significant environmental challenge due to its persistence and carcinogenic effects on ecosystems and human health globally. This review explores how ML algorithms can enhance the efficiency of bio-augmentation and phytoremediation through predictive modeling, real-time optimization of microbial consortia, and plant species selection. Traditional bioremediation methods, such as bioaugmentation and phytoremediation, are characterized by slow degradation rates and sub-optimal performance in complex, multi-contaminant environmental milieus. The use of machine learning (ML) models with multi-omics data presents an advanced predictive approach to optimizing bioremediation processes by providing a systematic understanding of microbial and plant-mediated hydrocarbon degradation strategies and processes. ML models can predict which microbial strains or plant species will effectively degrade hydrocarbons under specific environmental conditions by utilizing supervised learning methods such as support vector machines and neural networks. Additionally, the combination of multi-omics data with ML facilitates the identification of critical genes, enzymes, and metabolic pathways involved in the degradation of hydrocarbons, and offers insights into the molecular mechanisms which drive the bioremediation process. The translation of laboratory-based ML models into large-scale, real-world bioremediation strategy is hindered by the complex, dynamic nature of our contaminated environments. This review paper showcases these hinderances and provides a direction for future research, including the development of field-deployable technologies, adaptive ML models, and real-time environmental monitoring strategies. The integration of ML with multi-omics holds substantial promise for enhanced efficiency, adaptability, and scalability of bioremediation strategies which ultimately mitigates carcinogenic risks often associated with hydrocarbon-polluted lithosphere.

## Full-text entities

- **Diseases:** carcinogenic (MESH:D011230)
- **Chemicals:** Hydrocarbon (MESH:D006838), PAHs (MESH:D011084)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12999933/full.md

---
Source: https://tomesphere.com/paper/PMC12999933