# Predicting the risk of metabolic-associated fatty liver disease in the elderly population in China: construction and evaluation of interpretable machine learning models

**Authors:** Yingxin Zeng, Chaobing Yang, Xin Yang, Xinmei Zhang, Guodong Xia

PMC · DOI: 10.3389/fmed.2025.1678076 · Frontiers in Medicine · 2025-10-20

## TL;DR

This study builds and evaluates machine learning models to predict metabolic-associated fatty liver disease risk in elderly Chinese individuals using health data.

## Contribution

The study introduces an interpretable random forest model optimized for MAFLD risk prediction in the elderly using routine health examination data.

## Key findings

- The random forest model achieved an AUC of 0.892 in predicting MAFLD risk.
- The TyG-BMI index, height, and albumin levels were identified as key predictors.
- SHAP method provided individual-level interpretability for model predictions.

## Abstract

With the rising incidence of metabolic dysfunction-associated fatty liver disease (MAFLD) in the elderly population, this study aimed to develop an optimal screening model by comparing ten different machine learning (ML) algorithms to identify high-risk elderly individuals using routine health examination data.

The study included 2,635 individuals aged 60 years and older who underwent annual health examinations at the Health Management Center of Southwest Medical University Affiliated Hospital from January to December 2024. Initial feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression, followed by univariate and multivariate logistic regression analysis to identify nine independent predictive factors. Predictive models were constructed using 10 ML algorithms, and model performance was evaluated based on discriminative ability, calibration ability, and clinical utility. Feature importance was visualized and individual-level interpretability was provided using the Shapley Additive exPlanations (SHAP) method.

The final analysis included nine variables. After 10-fold cross-validation and hyperparameter tuning, the Random Forest (RF) model performed best, achieving an area under the curve (AUC) of 0.892 (95% CI: 0.870–0.914) in the validation cohort. Feature importance analysis revealed that the TyG-BMI index, height, and albumin levels played significant roles in predicting MAFLD risk.

Machine learning models, particularly the random forest algorithm, can effectively predict the risk of MAFLD in the elderly population. These models may assist clinicians in early screening and intervention, thereby improving patient outcomes.

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}
- **Diseases:** MAFLD (MESH:D005234)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12580202/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12580202/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12580202/full.md

---
Source: https://tomesphere.com/paper/PMC12580202