A foundation model for human-AI collaboration in medical literature   mining

Zifeng Wang; Lang Cao; Qiao Jin; Joey Chan; Nicholas Wan; Behdad; Afzali; Hyun-Jin Cho; Chang-In Choi; Mehdi Emamverdi; Manjot K. Gill,; Sun-Hyung Kim; Yijia Li; Yi Liu; Hanley Ong; Justin Rousseau; Irfan Sheikh,; Jenny J. Wei; Ziyang Xu; Christopher M. Zallek; Kyungsang Kim; Yifan Peng,; Zhiyong Lu; Jimeng Sun

arXiv:2501.16255·cs.CL·January 28, 2025

A foundation model for human-AI collaboration in medical literature mining

Zifeng Wang, Lang Cao, Qiao Jin, Joey Chan, Nicholas Wan, Behdad, Afzali, Hyun-Jin Cho, Chang-In Choi, Mehdi Emamverdi, Manjot K. Gill,, Sun-Hyung Kim, Yijia Li, Yi Liu, Hanley Ong, Justin Rousseau, Irfan Sheikh,, Jenny J. Wei, Ziyang Xu, Christopher M. Zallek, Kyungsang Kim

PDF

1 Repo 1 Models 1 Datasets

TL;DR

LEADS, a specialized AI foundation model, significantly improves medical literature mining by enhancing accuracy and efficiency in study selection and data extraction tasks, outperforming generic models and streamlining expert workflows.

Contribution

We introduce LEADS, a novel foundation model trained on extensive medical literature data, tailored for study search, screening, and data extraction, demonstrating superior performance over generic models.

Findings

01

LEADS improves recall in study selection to 0.81 from 0.77.

02

LEADS reduces data extraction time by 26.9%.

03

Experts using LEADS achieve higher accuracy in data extraction.

Abstract

Systematic literature review is essential for evidence-based medicine, requiring comprehensive analysis of clinical trial publications. However, the application of artificial intelligence (AI) models for medical literature mining has been limited by insufficient training and evaluation across broad therapeutic areas and diverse tasks. Here, we present LEADS, an AI foundation model for study search, screening, and data extraction from medical literature. The model is trained on 633,759 instruction data points in LEADSInstruct, curated from 21,335 systematic reviews, 453,625 clinical trial publications, and 27,015 clinical trial registries. We showed that LEADS demonstrates consistent improvements over four cutting-edge generic large language models (LLMs) on six tasks. Furthermore, LEADS enhances expert workflows by providing supportive references following expert requests, streamlining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pat-jj/deepretrieval
pytorch

Models

🤗
zifeng-ai/leads-mistral-7b-v1
model· 19 dl· ♡ 3
19 dl♡ 3

Datasets

zifeng-ai/LEADSInstruct
dataset· 259 dl
259 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.