Large Language Multimodal Models for 5-Year Chronic Disease Cohort   Prediction Using EHR Data

Jun-En Ding; Phan Nguyen Minh Thao; Wen-Chih Peng; Jian-Zhe Wang,; Chun-Cheng Chug; Min-Chen Hsieh; Yun-Chien Tseng; Ling Chen; Dongsheng Luo,; Chi-Te Wang; Pei-fu Chen; Feng Liu; and Fang-Ming Hung

arXiv:2403.04785·cs.CL·September 2, 2024·1 cites

Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

Jun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng, Jian-Zhe Wang,, Chun-Cheng Chug, Min-Chen Hsieh, Yun-Chien Tseng, Ling Chen, Dongsheng Luo,, Chi-Te Wang, Pei-fu Chen, Feng Liu, and Fang-Ming Hung

PDF

Open Access

TL;DR

This paper introduces a novel multimodal large language model framework that integrates clinical notes and laboratory data from EHRs to improve chronic disease prediction accuracy, especially for early-stage diabetes.

Contribution

It presents a new multimodal LLM approach combining clinical text and lab data, utilizing advanced attention mechanisms and pre-trained models for improved disease risk prediction.

Findings

01

Achieved 73% accuracy in multiclass chronic disease prediction.

02

Attained 76% AUROC in diabetes prediction using textual lab data.

03

Significantly enhanced early-stage diabetes prediction accuracy.

Abstract

Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from the Taiwan hospital database, including 1,420,596 clinical notes, 387,392 laboratory test results, and more than 1,505 laboratory test items, focusing on research pre-training large language models. We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data from clinical notes and laboratory test results for the prediction of chronic disease risk. Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention