A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing
Naeimeh Nourmohammadi, Md Meem Hossain, The Anh Han, Safina Showkat Ara, and Zia Ush Shamszaman

TL;DR
This paper introduces a multi-agent framework combining fine-tuned LLMs, evidence retrieval, and bias detection to improve the reliability and safety of AI-driven medical question answering systems.
Contribution
It presents a novel multi-agent architecture that integrates specialized LLMs with evidence grounding and bias checks for clinical QA, enhancing answer accuracy and trustworthiness.
Findings
DeepSeek R1 outperforms BioGPT in benchmark scores.
The full system achieves 87% accuracy in clinical QA.
Evidence augmentation reduces response uncertainty.
Abstract
Large language models (LLMs) show promise for healthcare question answering, but clinical use is limited by weak verification, insufficient evidence grounding, and unreliable confidence signalling. We propose a multi-agent medical QA framework that combines complementary LLMs with evidence retrieval, uncertainty estimation, and bias checks to improve answer reliability. Our approach has two phases. First, we fine-tune three representative LLM families (GPT, LLaMA, and DeepSeek R1) on MedQuAD-derived medical QA data (20k+ question-answer pairs across multiple NIH domains) and benchmark generation quality. DeepSeek R1 achieves the strongest scores (ROUGE-1 0.536 +- 0.04; ROUGE-2 0.226 +-0.03; BLEU 0.098 -+ 0.018) and substantially outperforms the specialised biomedical baseline BioGPT in zero-shot evaluation. Second, we implement a modular multi-agent pipeline in which a Clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Machine Learning in Healthcare · Topic Modeling
