# Investigating Factors Influencing Disease Progression in Patients With Non-Alcoholic Fatty Liver Disease

**Authors:** Yi-Chieh Tseng, Rewadee Jenraumjit, Ming-Jong Bair, Chung-Yu Chen, Fu-Shih Chen

PMC · DOI: 10.14740/jocmr6424 · 2026-02-28

## TL;DR

This study identifies risk factors for disease progression in non-alcoholic fatty liver disease patients using electronic medical records and clustering techniques.

## Contribution

The study introduces a novel approach combining clustering and survival analysis to identify distinct NAFLD phenotypes and associated risk factors.

## Key findings

- Four distinct NAFLD phenotypic clusters were identified from 6,023 patients.
- The highest-risk cluster showed significantly shorter median survival compared to others.
- 17 potential variables were identified as risk factors for disease progression.

## Abstract

With no approved pharmacological treatments for non-alcoholic fatty liver disease (NAFLD) in Taiwan, identifying protective and risk factors is crucial for preventing disease progression. Given the clinical heterogeneity of NAFLD, this study aimed to identify clinically meaningful NAFLD phenotypes using electronic medical records (EMRs) and unsupervised clustering, stratify risk across different clusters, identify factors associated with disease progression, and derive a parsimonious set of predictors for high-risk phenotypes.

This study was a retrospective cohort study conducted in three steps with iterative model training. In step 1, patients diagnosed with NAFLD were identified, and all relevant patient data were extracted, followed by clustering analysis using the k-prototype algorithm. In step 2, survival analysis and Cox regression were applied to perform risk stratification across clusters. In step 3, Lasso regression, logistic regression, and receiver operating characteristic (ROC) curve analysis were used to identify potential protective and risk factors associated with NAFLD and to derive a parsimonious set of predictors for high-risk phenotypes across different risk strata.

Step 1: The analysis of 6,023 patients identified four distinct phenotypic clusters. The first cluster had the most severe disease, the second the least. Step 2: Among 4,998 patients, the first cluster faced the highest risk for all outcomes, with a median survival of 3.06 years, significantly different from the others. There was no significant risk difference between the second and third clusters. Step 3: A comparison of the highest-risk and lowest-risk clusters finally identified 17 potential variables.

Using multiple analytical models, this study identified 17 potential risk factors associated with NAFLD progression. Their combined assessment may inform future risk stratification and hypothesis generation. Further validation is required before clinical application.

## Linked entities

- **Diseases:** non-alcoholic fatty liver disease (MONDO:0013209), NAFLD (MONDO:0013209)

## Full-text entities

- **Diseases:** NAFLD (MESH:D065626)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12978391/full.md

---
Source: https://tomesphere.com/paper/PMC12978391