A Survey on Large Language Models from General Purpose to Medical   Applications: Datasets, Methodologies, and Evaluations

Jinqiang Wang; Huansheng Ning; Yi Peng; Qikai Wei; Daniel Tesfai,; Wenwei Mao; Tao Zhu; Runhe Huang

arXiv:2406.10303·cs.CL·September 24, 2024

A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations

Jinqiang Wang, Huansheng Ning, Yi Peng, Qikai Wei, Daniel Tesfai,, Wenwei Mao, Tao Zhu, Runhe Huang

PDF

Open Access 1 Repo

TL;DR

This survey reviews the development, training, and evaluation of medical large language models based on general-purpose LLMs, highlighting datasets, methodologies, challenges, and future research directions in medical applications.

Contribution

It provides a comprehensive, fine-grained overview of training medical LLMs from open-source general models, including dataset construction, training paradigms, and evaluation benchmarks.

Findings

01

Medical LLMs excel in doctor-patient dialogues and diagnosis.

02

Fine-tuning open-source models reduces computational costs and enhances privacy.

03

The survey identifies key challenges and future research directions in medical LLM development.

Abstract

Large Language Models (LLMs) have demonstrated surprising performance across various natural language processing tasks. Recently, medical LLMs enhanced with domain-specific knowledge have exhibited excellent capabilities in medical consultation and diagnosis. These models can smoothly simulate doctor-patient dialogues and provide professional medical advice. Most medical LLMs are developed through continued training of open-source general LLMs, which require significantly fewer computational resources than training LLMs from scratch. Additionally, this approach offers better patient privacy protection than API-based solutions. Given the above advantages, this survey systematically summarizes how to train medical LLMs based on open-source general LLMs from a more fine-grained perspective. It covers (a) how to acquire training corpus and construct customized medical training sets, (b) how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jqwangai/medical-llm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare