Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?

Sushant Kumar Ray; Gautam Siddharth Kashyap; Sahil Tripathi; Nipun Joshi; Vijay Govindarajan; Rafiq Ali; Jiechao Gao; Usman Naseem

arXiv:2601.12812·cs.CL·January 21, 2026

Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?

Sushant Kumar Ray, Gautam Siddharth Kashyap, Sahil Tripathi, Nipun Joshi, Vijay Govindarajan, Rafiq Ali, Jiechao Gao, Usman Naseem

PDF

Open Access 1 Video

TL;DR

This paper introduces MEDASSESS-X, an inference-time alignment method that enhances clinical question-answering performance across various LLMs without the need for domain-specific fine-tuning, challenging the necessity of specialised medical models.

Contribution

MEDASSESS-X demonstrates that inference-time alignment with lightweight steering vectors can improve medical LLMs' performance, avoiding costly fine-tuning and addressing the SPECIALISATION FALLACY.

Findings

01

Accuracy improved by up to +6%.

02

Factual consistency increased by +7%.

03

Safety error rate reduced by up to 50%.

Abstract

Clinical Question-Answering (CQA) industry systems are increasingly rely on Large Language Models (LLMs), yet their deployment is often guided by the assumption that domain-specific fine-tuning is essential. Although specialised medical LLMs such as BioBERT, BioGPT, and PubMedBERT remain popular, they face practical limitations including narrow coverage, high retraining costs, and limited adaptability. Efforts based on Supervised Fine-Tuning (SFT) have attempted to address these assumptions but continue to reinforce what we term the SPECIALISATION FALLACY-the belief that specialised medical LLMs are inherently superior for CQA. To address this assumption, we introduce MEDASSESS-X, a deployment-industry-oriented CQA framework that applies alignment at inference time rather than through SFT. MEDASSESS-X uses lightweight steering vectors to guide model activations toward medically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Do Clinical Question Answering Systems Really Need Specialised Medical Fine Tuning?· underline

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Multimodal Machine Learning Applications