Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

Qiao Jin; Yin Fang; Lauren He; Yifan Yang; Guangzhi Xiong; Zhizheng Wang; Nicholas Wan; Joey Chan; Donald C. Comeau; Robert Leaman; Charalampos S. Floudas; Aidong Zhang; Michael F. Chiang; Yifan Peng; and Zhiyong Lu

arXiv:2603.05308·cs.CL·May 19, 2026

Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution

Qiao Jin, Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong, Zhizheng Wang, Nicholas Wan, Joey Chan, Donald C. Comeau, Robert Leaman, Charalampos S. Floudas, Aidong Zhang, Michael F. Chiang, Yifan Peng, and Zhiyong Lu

PDF

1 Repo 2 Models 2 Datasets

TL;DR

Med-V1 is a small, 3-billion-parameter language model trained on synthetic biomedical data, achieving performance comparable to large models like GPT-5 in evidence attribution and hallucination detection tasks.

Contribution

This paper introduces Med-V1, a lightweight biomedical language model that outperforms base models and rivals larger models in evidence verification tasks.

Findings

01

Med-V1 outperforms base models by 27-71% on biomedical benchmarks.

02

Citation instructions significantly influence hallucination rates in LLMs.

03

Med-V1 can identify evidence misattributions in clinical guidelines.

Abstract

Assessing whether an article supports an assertion is essential for hallucination detection and claim verification. While large language models (LLMs) have the potential to automate this task, achieving strong performance requires frontier models such as GPT-5 that are prohibitively expensive to deploy at scale. To efficiently perform biomedical evidence attribution, we present Med-V1, a family of small language models with only three billion parameters. Trained on high-quality synthetic data newly developed in this study, Med-V1 substantially outperforms (+27.0% to +71.3%) its base models on five biomedical benchmarks unified into a verification format. Despite its smaller size, Med-V1 performs comparably to frontier LLMs such as GPT-5, along with high-quality explanations for its predictions. We use Med-V1 to conduct a first-of-its-kind use case study that quantifies hallucinations in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ncbi-nlp/Med-V1
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)