# Enhancing CYP3A4 Inhibition Prediction Using a Hybrid GNN–ML Model with Data Augmentation

**Authors:** Somin Woo, Ju-Hyeok Jeon, Sangil Han, Changkyu Lee, Sang-Hyun Min

PMC · DOI: 10.3390/ph19020258 · Pharmaceuticals · 2026-02-02

## TL;DR

This paper introduces a hybrid AI model combining machine learning and graph neural networks to better predict how drugs inhibit a key liver enzyme, improving drug safety assessments.

## Contribution

A novel hybrid model integrating ML and GNN with data augmentation for enhanced CYP3A4 inhibition prediction accuracy and interpretability.

## Key findings

- The weighted ML ensemble achieved RMSE = 19.1031 and PCC = 0.7566 for CYP3A4 inhibition prediction.
- The hybrid model outperformed individual ML and GNN models with RMSE = 19.0784 and PCC = 0.7570.
- External validation on 100 new data points confirmed generalizability with a custom metric of 0.8035.

## Abstract

Background/Objectives: Cytochrome P450 3A4 (CYP3A4) metabolizes approximately 30–50% of clinically used drugs; thus, accurate prediction of CYP3A4 inhibition is essential for early assessment of drug–drug interaction (DDI) risk and toxicity. This study evaluated an integrated artificial intelligence framework for predicting CYP3A4 inhibition (%) using a large, curated chemical dataset. Methods: A dataset of 23,713 compounds was compiled from the Korea Chemical Bank and multiple commercial and public databases. Vector-based machine learning (ML) models (LightGBM, XGBoost, CatBoost, and a weighted ML ensemble) and graph neural network (GNN) models (O-GNN with contrastive learning and manifold mixup (O-GNN + CL + Mixup), D-MPNN, GINE, and GATv2) were evaluated. Manifold mixup was applied during GNN training, and SMILES enumeration-based test-time augmentation was used at inference. The best-performing ML and GNN models were integrated using a weighted ensemble strategy. Model interpretability was examined using SHAP analysis for ML models and occlusion sensitivity analysis for O-GNN + CL + Mixup. Results: The weighted ML ensemble achieved the best performance among ML models (RMSE = 19.1031, Pearson correlation coefficient (PCC) = 0.7566); the O-GNN + CL + Mixup model performed the best among GNN models (RMSE = 20.1002, PCC = 0.7265). The hybrid model achieved improved predictive accuracy (RMSE = 19.0784, PCC = 0.7570). External validation on 100 newly generated experimental data points confirmed generalizability (Custom Metric = 0.8035). Conclusions: This study demonstrated that integrating ML and GNN models with data augmentation strategies improves the robustness and interpretability of CYP3A4 inhibition prediction and established a practical framework for metabolic screening and DDI risk assessment.

## Linked entities

- **Genes:** CYP3A4 (cytochrome P450 family 3 subfamily A member 4) [NCBI Gene 1576]

## Full-text entities

- **Genes:** CYP3A4 (cytochrome P450 family 3 subfamily A member 4) [NCBI Gene 1576] {aka CP33, CP34, CYP3A, CYP3A3, CYPIIIA3, CYPIIIA4}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, PPIG (peptidylprolyl isomerase G) [NCBI Gene 9360] {aka CARS-Cyp, CYP, SCAF10, SRCyp}
- **Diseases:** toxicity (MESH:D064420), injury to (MESH:D014947)
- **Chemicals:** heme (MESH:D006418), Phe (MESH:D010649), CL (MESH:D002713), Bemis (-), VSA (MESH:C085726), pyridine (MESH:C023666), Trp (MESH:D014364), hydrogen (MESH:D006859), indole (MESH:C030374), pyrazine (MESH:D011719), triazole (MESH:D014230), piperidine (MESH:C032727), lipid (MESH:D008055), nitrogen (MESH:D009584), carbon (MESH:D002244), metal (MESH:D008670), oxygen (MESH:D010100), midazolam (MESH:D008874), Benzodioxole (MESH:D052117), Tyr (MESH:D014443), imidazole (MESH:C029899), water (MESH:D014867), testosterone (MESH:D013739)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12944684/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12944684/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC12944684/full.md

---
Source: https://tomesphere.com/paper/PMC12944684