# Machine Learning Exploration of Food-Derived Chemical Space for Potential Nutritional Metabolic Regulators Targeting Dipeptidyl Peptidase-4

**Authors:** Nada A. Alzunaidy

PMC · DOI: 10.3390/ph19030349 · Pharmaceuticals · 2026-02-24

## TL;DR

This study uses machine learning to identify food-derived compounds that may regulate metabolism by interacting with the enzyme DPP4.

## Contribution

A scalable computational framework combining machine learning and molecular simulations to prioritize food-derived DPP4 regulators.

## Key findings

- A top-performing random forest model achieved high accuracy (mean AUC 0.889) in predicting DPP4 interactions.
- Seven food-derived compounds, including coenzyme A derivatives, showed favorable binding to DPP4 in simulations.
- Molecular dynamics confirmed stable binding and key residue interactions for prioritized compounds.

## Abstract

Background: Dipeptidyl peptidase-4 (DPP4) is a key metabolic enzyme involved in postprandial glucose regulation through incretin hormone modulation, making it an important target in nutrition and metabolic health research. Although dietary and plant-derived bioactive compounds have been reported to influence DPP4, exploration of the food-associated chemical space remains limited by its size and diversity. Methods: Here, we present an integrated computational framework combining machine learning, molecular docking, and molecular dynamics simulations to prioritize dietary and supplemental compounds with potential interaction capacity toward DPP4. Supervised classification models were trained on a curated DPP4 bioactivity dataset and evaluated using scaffold-based partitioning to ensure chemically realistic generalization. Results: The top-performing random forest model achieved robust performance across independent splits (mean AUC 0.889 ± 0.017; average precision 0.959 ± 0.010) and was applied to screen 69,574 food-derived compounds. Model interpretation identified recurring heteroaromatic and polar substructural features associated with predicted interaction propensity. Structure-based screening further prioritized seven food-derived compounds, including lipid-associated coenzyme A derivatives, which occupied the canonical DPP4 binding site with favorable docking scores (−13.12 to −12.06 kcal/mol). Extended molecular dynamics simulations (500 ns) demonstrated stable binding geometries, compact hydrogen-bond networks, and consistent engagement of key DPP4 residues, including Glu205, Glu206, Arg125, and Tyr631. Conclusions: Overall, our study provides a scalable computational strategy for identifying bioactive dietary and supplemental compounds with potential relevance to metabolic regulation. The framework supports nutraceutical research and functional food development by enabling targeted experimental investigation of diet–enzyme interactions.

## Linked entities

- **Proteins:** DPP4 (dipeptidyl peptidase 4)
- **Chemicals:** coenzyme A (PubChem CID 87642)

## Full-text entities

- **Genes:** DPP4 (dipeptidyl peptidase 4) [NCBI Gene 1803] {aka ADABP, ADCP2, CD26, DPPIV, TP103}
- **Chemicals:** hydrogen (MESH:D006859), coenzyme A (MESH:D003065), lipid (MESH:D008055), glucose (MESH:D005947)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13029763/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13029763/full.md

## References

67 references — full list in the complete paper: https://tomesphere.com/paper/PMC13029763/full.md

---
Source: https://tomesphere.com/paper/PMC13029763