Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework

Tasnimul Hassan; Md Faisal Karim; Haziq Jeelani; Elham Behnam; Robert Green; and Fayeq Jeelani Syed

arXiv:2512.05863·cs.CL·December 8, 2025

Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework

Tasnimul Hassan, Md Faisal Karim, Haziq Jeelani, Elham Behnam, Robert Green, and Fayeq Jeelani Syed

PDF

Open Access

TL;DR

This paper develops a retrieval-augmented generation system for medical question-answering that combines domain-specific knowledge retrieval with fine-tuned open-source large language models, significantly improving accuracy and factual correctness.

Contribution

It introduces a RAG-based medical QA system using fine-tuned open-source LLMs with LoRA, demonstrating improved accuracy and reduced hallucinations compared to zero-shot models.

Findings

01

LLaMA 2 achieves 71.8% accuracy on PubMedQA.

02

Retrieval augmentation improves answer accuracy over baseline.

03

Grounding answers reduces unsupported content by 60%.

Abstract

Medical question-answering (QA) systems can benefit from advances in large language models (LLMs), but directly applying LLMs to the clinical domain poses challenges such as maintaining factual accuracy and avoiding hallucinations. In this paper, we present a retrieval-augmented generation (RAG) based medical QA system that combines domain-specific knowledge retrieval with open-source LLMs to answer medical questions. We fine-tune two state-of-the-art open LLMs (LLaMA~2 and Falcon) using Low-Rank Adaptation (LoRA) for efficient domain specialization. The system retrieves relevant medical literature to ground the LLM's answers, thereby improving factual correctness and reducing hallucinations. We evaluate the approach on benchmark datasets (PubMedQA and MedMCQA) and show that retrieval augmentation yields measurable improvements in answer accuracy compared to using LLMs alone. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Multimodal Machine Learning Applications