Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs

Chenqian Le; Ziheng Gong; Chihang Wang; Haowei Ni; Panfeng Li; Xupeng Chen

arXiv:2506.12182·cs.CL·November 12, 2025

Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs

Chenqian Le, Ziheng Gong, Chihang Wang, Haowei Ni, Panfeng Li, Xupeng Chen

PDF

Open Access

TL;DR

This study investigates how prompt design and lightweight fine-tuning influence the performance of open-source LLMs on biomedical question answering, highlighting the importance of reasoning-aware prompts and their scale-dependent effects.

Contribution

It provides a comprehensive analysis of instruction tuning and Chain-of-Thought prompting for medical QA, revealing their effects across different models and sizes.

Findings

01

CoT prompting improves zero-shot reasoning.

02

Instruction tuning boosts accuracy significantly.

03

Fine-tuning on CoT prompts may degrade larger models' performance.

Abstract

Large language models (LLMs) have shown great potential in medical question answering (MedQA), yet adapting them to biomedical reasoning remains challenging due to domain-specific complexity and limited supervision. In this work, we study how prompt design and lightweight fine-tuning affect the performance of open-source LLMs on PubMedQA, a benchmark for multiple-choice biomedical questions. We focus on two widely used prompting strategies - standard instruction prompts and Chain-of-Thought (CoT) prompts - and apply QLoRA for parameter-efficient instruction tuning. Across multiple model families and sizes, our experiments show that CoT prompting alone can improve reasoning in zero-shot settings, while instruction tuning significantly boosts accuracy. However, fine-tuning on CoT prompts does not universally enhance performance and may even degrade it for certain larger models. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning

MethodsFocus · Chain-of-thought prompting