# Machine learning prediction of metabolic-associated fatty liver disease in type 2 diabetes: Emphasizing data imputation and feature selection

**Authors:** Zahra Khosravi, Farnaz Barzinpour, Soghra Rabizadeh, Manouchehr Nakhjavani, Alireza Esteghamati

PMC · DOI: 10.1371/journal.pone.0339580 · PLOS One · 2026-02-24

## TL;DR

This study uses machine learning to predict fatty liver disease in type 2 diabetes patients, emphasizing data imputation and feature selection techniques.

## Contribution

The study introduces a novel approach combining data imputation and feature selection for predicting MAFLD in T2DM patients.

## Key findings

- XGBoost classifier achieved 80.6% accuracy and 88.9% AUC in predicting MAFLD.
- ALT, PLT, and VitD were identified as influential features in the model.
- Data imputation and feature selection significantly improved model performance.

## Abstract

Metabolic-Associated Fatty Liver Disease (MAFLD) is common among Type 2 Diabetes (T2DM) patients. The coexistence of these conditions increases the risk of MAFLD progression and diabetes complications. Detecting MAFLD early is challenging due to its asymptomatic initial stages. In this study, we aimed to develop a machine learning model to predict MAFLD in T2DM patients. We conducted a cross-sectional study on 3,654 Iranian T2DM patients using their demographic and lab data. This study involved thorough data preprocessing, including evaluating various imputation methods on simulated missingness in a complete subset of the dataset. Additionally, four feature selection methods were applied to eight machine learning models to identify the most effective predictive model. The XGBoost classifier without feature selection achieved the best performance, with an accuracy of 80.6% and an area under the receiver operating characteristic curve (AUC) of 88.9%.Notably, certain features, such as alanine aminotransferase (ALT), platelet count (PLT) and Vitamin D(VitD) influenced the predictive performance.

## Linked entities

- **Diseases:** Type 2 Diabetes (MONDO:0005148)

## Full-text entities

- **Genes:** INS (insulin) [NCBI Gene 3630] {aka IDDM, IDDM1, IDDM2, ILPR, IRDN, MODY10}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, SLC17A5 (solute carrier family 17 member 5) [NCBI Gene 26503] {aka AST, ISSD, NSD, SD, SIALIN, SIASD}, PLIN2 (perilipin 2) [NCBI Gene 123] {aka ADFP, ADRP}, GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}, AIP (AHR interacting HSP90 co-chaperone) [NCBI Gene 9049] {aka ARA9, FKBP16, FKBP37, PITA1, SMTPHN, XAP-2}
- **Diseases:** NAFLD (MESH:D065626), MNAR (MESH:D000030), Diabetes (MESH:D003920), cardiovascular and chronic kidney diseases (MESH:D051436), insulin resistance (MESH:D007333), liver inflammation (MESH:D007249), liver-related diseases (MESH:D008107), cirrhosis (MESH:D005355), HTN (MESH:D006973), metabolic dysregulation (MESH:D021081), fat accumulation (MESH:D004620), T2DM (MESH:D003924), -associated fatty liver disease (MESH:D005234), NASH (MESH:D005235), obese (MESH:D009765), Retino (MESH:D058437), CVA (MESH:D020521), overweight (MESH:D050177), CAD (MESH:D003324), liver disorder (MESH:D017093)
- **Chemicals:** UA (MESH:D014527), TG (MESH:D014280), 2-Hour (-), Vit D (MESH:D014807), Tg (MESH:D013866), sugar (MESH:D000073893), alcohol (MESH:D000438), Cr (MESH:D003404), Glucose (MESH:D005947), Chl (MESH:D002784), blood sugar (MESH:D001786)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12931757/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12931757/full.md

## References

55 references — full list in the complete paper: https://tomesphere.com/paper/PMC12931757/full.md

---
Source: https://tomesphere.com/paper/PMC12931757