# Using Machine Learning for the Discovery and Development of Multitarget Flavonoid-Based Functional Products in MASLD

**Authors:** Maksim Kuznetsov, Evgeniya Klein, Daria Velina, Sherzodkhon Mutallibzoda, Olga Orlovtseva, Svetlana Tefikova, Dina Klyuchnikova, Igor Nikitin

PMC · DOI: 10.3390/molecules30214159 · 2025-10-22

## TL;DR

This paper introduces a machine learning-based pipeline to design multi-target nutraceutical products for treating MASLD using flavonoids.

## Contribution

A novel in silico pipeline integrating molecular prediction, aggregation, and formulation design for multi-target nutraceutical development in MASLD.

## Key findings

- A stacked ensemble model achieved high performance (ROC-AUC 0.834) in predicting bioactivity for MASLD targets.
- Three prototype nutraceutical concepts were designed with tailored dosing and formulation strategies using PBPK modeling.
- The pipeline ensures chemical diversity and practical formulation by combining activity metrics with physicochemical properties.

## Abstract

Metabolic dysfunction-associated steatotic liver disease (MASLD) represents a multifactorial condition requiring multi-target therapeutic strategies beyond traditional single-marker approaches. In this work, we present a fully in silico nutraceutical screening pipeline that integrates molecular prediction, systemic aggregation, and technological design. A curated panel of ten MASLD-relevant targets, spanning nuclear receptors (FXR, PPAR-α/γ, THR-β), lipogenic and cholesterogenic enzymes (ACC1, FASN, DGAT2, HMGCR), and transport/regulatory proteins (LIPG, FABP4), was assembled from proteomic evidence. Bioactivity records were extracted from ChEMBL, structurally standardized, and converted into RDKit descriptors. Predictive modeling employed a stacked ensemble of Random Forest, XGBoost, and CatBoost with isotonic calibration, yielding robust performance (mean cross-validated ROC-AUC 0.834; independent test ROC-AUC 0.840). Calibrated probabilities were aggregated into total activity (TA) and weighted TA metrics, combined with structural clustering (six structural clusters, twelve MOA clusters) to ensure chemical diversity. We used physiologically based pharmacokinetic (PBPK) modeling to translate probabilistic profiles into minimum simulated doses (MSDs) and chrono-specific exposure (%T>IC50) for three prototype concepts: HepatoBlend (morning powder), LiverGuard Tea (evening aqueous form), and HDL-Chews (postprandial chew). Integration of physicochemical descriptors (MW, logP, TPSA) guided carrier and encapsulation choices, addressing stability and sensory constraints. The results demonstrate that a computationally integrated pipeline can rationally generate multi-target nutraceutical formulations, linking molecular predictions with systemic coverage and practical formulation specifications, and thus provides a transferable framework for MASLD and related metabolic conditions.

## Linked entities

- **Proteins:** NR1H4 (nuclear receptor subfamily 1 group H member 4), PPARA (peroxisome proliferator activated receptor alpha), PPARG (peroxisome proliferator activated receptor gamma), THRB (thyroid hormone receptor beta), ACACA (acetyl-CoA carboxylase alpha), FASN (fatty acid synthase), DGAT2 (diacylglycerol O-acyltransferase 2), HMGCR (3-hydroxy-3-methylglutaryl-CoA reductase), LIPG (lipase G, endothelial type), FABP4 (fatty acid binding protein 4)
- **Diseases:** MASLD (MONDO:0013209)

## Full-text entities

- **Genes:** FABP4 (fatty acid binding protein 4) [NCBI Gene 2167] {aka A-FABP, AFABP, ALBP, HEL-S-104, aP2}, FASN (fatty acid synthase) [NCBI Gene 2194] {aka FAS, OA-519, SDR27X1}, LIPG (lipase G, endothelial type) [NCBI Gene 9388] {aka EDL, EL, PRO719}, NR1H4 (nuclear receptor subfamily 1 group H member 4) [NCBI Gene 9971] {aka BAR, FXR, HRR-1, HRR1, PFIC5, RIP14}, BCL2A1 (BCL2 related protein A1) [NCBI Gene 597] {aka ACC-1, ACC-2, ACC1, ACC2, BCL2L5, BFL1}, DGAT2 (diacylglycerol O-acyltransferase 2) [NCBI Gene 84649] {aka ARAT, GS1999FULL, HMFN1045}, THRB (thyroid hormone receptor beta) [NCBI Gene 7068] {aka C-ERBA-2, C-ERBA-BETA, ERBA2, GRTH, NR1A2, PRTH}, HMGCR (3-hydroxy-3-methylglutaryl-CoA reductase) [NCBI Gene 3156] {aka LDLCQ3, LGMDR28, MYPLG}
- **Diseases:** MASLD (MESH:D008107)
- **Chemicals:** Flavonoid (MESH:D005419)

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12609199/full.md

---
Source: https://tomesphere.com/paper/PMC12609199