Entry-level guide to the use of large language models for medical research

Qiao Jin; Nicholas Wan; Robert Leaman; Shubo Tian; Zhizheng Wang; Yifan Yang; Zifeng Wang; Guangzhi Xiong; Po-Ting Lai; Qingqing Zhu; Benjamin Hou; Maame Sarfo-Gyamfi; Gongbo Zhang; Aidan Gilson; Balu Bhasuran; Zhe He; Aidong Zhang; Jimeng Sun; Chunhua Weng; Ronald M. Summers; Qingyu Chen; Yifan Peng; Zhiyong Lu

arXiv:2410.18856·cs.AI·May 20, 2026·2 cites

Entry-level guide to the use of large language models for medical research

Qiao Jin, Nicholas Wan, Robert Leaman, Shubo Tian, Zhizheng Wang, Yifan Yang, Zifeng Wang, Guangzhi Xiong, Po-Ting Lai, Qingqing Zhu, Benjamin Hou, Maame Sarfo-Gyamfi, Gongbo Zhang, Aidan Gilson, Balu Bhasuran, Zhe He, Aidong Zhang, Jimeng Sun, Chunhua Weng, Ronald M. Summers

PDF

1 Repo

TL;DR

This paper provides healthcare professionals with a practical, step-by-step guide to effectively and safely utilize large language models in various medical research and clinical tasks.

Contribution

It introduces an actionable workflow and best practices for integrating LLMs into healthcare, focusing on task formulation, model selection, prompt engineering, fine-tuning, and deployment.

Findings

01

Guidelines for selecting appropriate LLMs for medical tasks

02

Strategies for prompt engineering and model fine-tuning

03

Considerations for deployment including ethics and regulation

Abstract

Frontier large language models (LLMs), such as GPT-5, Claude 4.5, Gemini 3, Llama 4, and DeepSeek-R1, represent a transformative class of AI tools capable of revolutionizing various aspects of healthcare by generating human-like responses across diverse contexts and adapting to novel tasks following human instructions. Their potential application spans a broad range of medical tasks, such as clinical documentation, matching patients to clinical trials, and answering medical questions. In this paper, we propose an actionable guideline to help healthcare professionals more effectively and efficiently utilize LLMs in their work, along with a set of best practices. The overall workflow consists of several main phases, including formulating the task, choosing LLMs, prompt engineering, fine-tuning, and model deployment. We start with the discussion of critical considerations in identifying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ncbi-nlp/llm-medicine-primer
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Health Systems, Economic Evaluations, Quality of Life

MethodsALIGN · Sparse Evolutionary Training