What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs
Xinlan Yan, Di Wu, Yibin Lei, Christof Monz, Iacer Calixto

TL;DR
This paper introduces S-MedQA, a large medical QA dataset across 15 specialties, and investigates how clinical specialty data influences LLM performance, revealing that domain shift may be more impactful than specialty-specific fine-tuning.
Contribution
The creation of S-MedQA, a comprehensive dataset for benchmarking medical LLMs across specialties, and an analysis of the effects of specialty data on model performance and knowledge representation.
Findings
Training on a specialty does not guarantee better performance in that specialty.
Token probabilities of relevant terms increase across all specialties regardless of training data.
Domain shifting from general to medical domains has a significant impact on model improvements.
Abstract
In this paper, we introduce S-MedQA, an English medical question-answering (QA) dataset designed for benchmarking large language models (LLMs) in fine-grained clinical specialties. S-MedQA consists of over 24k examples, covering 15 medical specialties, with QA pairs that can have multiple specialty annotations, such as when a question is cross-disciplinary. The dataset is constructed using both machine and expert verification to maximize data availability and reliability. We use S-MedQA to investigate the role of clinical specialties in the knowledge-intensive scenario of medical QA. Our results show that training on data from a clinical specialty does not necessarily lead to the best performance on that specialty. Additionally, regardless of the specialty the LLM was fine-tuned on, token probabilities of clinically relevant terms consistently increase across all specialties. Based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClinical Reasoning and Diagnostic Skills · Health and Medical Research Impacts
