Mitigating Hallucinations in Healthcare LLMs with Granular Fact-Checking and Domain-Specific Adaptation

Musarrat Zeba; Abdullah Al Mamun; Kishoar Jahan Tithee; Debopom Sutradhar; Mohaimenul Azam Khan Raiaan; Saddam Mukta; Reem E. Mohamed; Md Rafiqul Islam; Yakub Sebastian; Mukhtar Hussain; Sami Azam

arXiv:2512.16189·cs.CL·December 22, 2025

Mitigating Hallucinations in Healthcare LLMs with Granular Fact-Checking and Domain-Specific Adaptation

Musarrat Zeba, Abdullah Al Mamun, Kishoar Jahan Tithee, Debopom Sutradhar, Mohaimenul Azam Khan Raiaan, Saddam Mukta, Reem E. Mohamed, Md Rafiqul Islam, Yakub Sebastian, Mukhtar Hussain, Sami Azam

PDF

Open Access

TL;DR

This paper introduces a fact-checking system and a domain-specific summarization model to reduce hallucinations in healthcare language models, improving reliability and accuracy in critical medical applications.

Contribution

It presents a novel independent fact-checking module combined with a fine-tuned summarization model to enhance factual accuracy in healthcare LLM outputs.

Findings

01

Fact-checking module achieves high precision and recall (around 0.89 and 0.82).

02

Summarization model attains ROUGE-1 of 0.58 and BERTScore of 0.91.

03

The approach significantly reduces hallucination rates in healthcare LLMs.

Abstract

In healthcare, it is essential for any LLM-generated output to be reliable and accurate, particularly in cases involving decision-making and patient safety. However, the outputs are often unreliable in such critical areas due to the risk of hallucinated outputs from the LLMs. To address this issue, we propose a fact-checking module that operates independently of any LLM, along with a domain-specific summarization model designed to minimize hallucination rates. Our model is fine-tuned using Low-Rank Adaptation (LoRa) on the MIMIC III dataset and is paired with the fact-checking module, which uses numerical tests for correctness and logical checks at a granular level through discrete logic in natural language processing (NLP) to validate facts against electronic health records (EHRs). We trained the LLM model on the full MIMIC-III dataset. For evaluation of the fact-checking module, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Data Quality and Management