Predicting Metabolic Dysfunction-Associated Steatotic Liver Disease using Machine Learning Methods: A Retrospective Cohort Study
Mary E. An, Paul M. Griffin, Jonathan G. Stine, Balakrishnan S. Ramakrishna, Soundar R.T. Kumara

TL;DR
This study develops and evaluates a machine learning-based prediction model for early detection of MASLD using EHR data, emphasizing fairness across racial groups and demonstrating competitive performance.
Contribution
Introduces MASER, an interpretable and fair EHR-based prediction model for MASLD, utilizing a limited feature set and addressing disparities in true positive rates.
Findings
LASSO logistic regression with top features achieved AUROC of 0.84
Fairness adjustment increased accuracy to 81% and specificity to 94%
Model performance is comparable to complex ensemble and tree-based models.
Abstract
Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) affects 30-40% of US adults and is the most common chronic liver disease. Although often asymptomatic, progression can lead to cirrhosis. The objective of the study was to develop and evaluate an electronic health record (EHR) based prediction model to support early detection of MASLD in primary care settings. Methods: We evaluated LASSO logistic regression, random forest, XGBoost, and a neural network model for MASLD prediction using clinical feature subsets from a large EHR database, including the top 10 ranked features. To reduce disparities in true positive rates across racial and ethnic subgroups, we applied an equal opportunity postprocessing method in a prediction model called MASLD EHR Static Risk Prediction (MASER). Results: This retrospective cohort study included 59,492 participants in the training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
