Can Large Language Models Logically Predict Myocardial Infarction?   Evaluation based on UK Biobank Cohort

Yuxing Zhi; Yuan Guo; Kai Yuan; Hesong Wang; Heng Xu; Haina Yao,; Albert C Yang; Guangrui Huang; Yuping Duan

arXiv:2409.14478·cs.AI·September 24, 2024

Can Large Language Models Logically Predict Myocardial Infarction? Evaluation based on UK Biobank Cohort

Yuxing Zhi, Yuan Guo, Kai Yuan, Hesong Wang, Heng Xu, Haina Yao,, Albert C Yang, Guangrui Huang, Yuping Duan

PDF

Open Access

TL;DR

This study assesses whether state-of-the-art large language models like ChatGPT and GPT-4 can accurately predict myocardial infarction risk from UK Biobank data, revealing current limitations in clinical decision support applications.

Contribution

It provides a quantitative evaluation of LLMs' ability to predict MI risk using real-world medical data and compares their performance with traditional models and medical indices.

Findings

01

LLMs currently lack sufficient accuracy for clinical MI prediction

02

Chain of Thought prompting helps evaluate logical inference in LLMs

03

Future medical LLMs should integrate domain knowledge for better performance

Abstract

Background: Large language models (LLMs) have seen extraordinary advances with applications in clinical decision support. However, high-quality evidence is urgently needed on the potential and limitation of LLMs in providing accurate clinical decisions based on real-world medical data. Objective: To evaluate quantitatively whether universal state-of-the-art LLMs (ChatGPT and GPT-4) can predict the incidence risk of myocardial infarction (MI) with logical inference, and to further make comparison between various models to assess the performance of LLMs comprehensively. Methods: In this retrospective cohort study, 482,310 participants recruited from 2006 to 2010 were initially included in UK Biobank database and later on resampled into a final cohort of 690 participants. For each participant, tabular data of the risk factors of MI were transformed into standardized textual descriptions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare