Does Biomedical Training Lead to Better Medical Performance?

Amin Dada; Marie Bauer; Amanda Butler Contreras; Osman Alperen Kora\c{s}; Constantin Marc Seibold; Kaleb E Smith; Jens Kleesiek

arXiv:2404.04067·cs.CL·October 14, 2025·2 cites

Does Biomedical Training Lead to Better Medical Performance?

Amin Dada, Marie Bauer, Amanda Butler Contreras, Osman Alperen Kora\c{s}, Constantin Marc Seibold, Kaleb E Smith, Jens Kleesiek

PDF

Open Access 1 Repo

TL;DR

This study systematically evaluates biomedical LLMs on medical tasks, revealing that biomedical fine-tuning often reduces performance and general models can outperform domain-specific models, highlighting a trade-off in model training.

Contribution

It provides the first comprehensive evaluation of biomedical training effects on medical task performance, revealing potential drawbacks of domain-specific fine-tuning.

Findings

01

Biomedical fine-tuning often decreases model performance on medical tasks.

02

General-domain models can outperform biomedical models in medical tasks.

03

Open-source datasets and scripts facilitate further research.

Abstract

Large Language Models (LLMs) are expected to significantly contribute to patient care, diagnostics, and administrative processes. Emerging biomedical LLMs aim to address healthcare-specific challenges, including privacy demands and computational constraints. Assessing the models' suitability for this sensitive application area is of the utmost importance. However, biomedical training has not been systematically evaluated on medical tasks. This study investigates the effect of biomedical training in the context of six practical medical tasks evaluating $25$ models. In contrast to previous evaluations, our results reveal a performance decline in nine out of twelve biomedical models after fine-tuning, particularly on tasks involving hallucinations, ICD10 coding, and instruction adherence. General-domain models like Meta-Llama-3.1-70B-Instruct outperformed their biomedical counterparts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tio-ikim/clue
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsALIGN