Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots

Maria Paola Priola

arXiv:2412.04235·cs.CL·January 1, 2026

Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots

Maria Paola Priola

PDF

Open Access

TL;DR

This paper presents a combined detection and mitigation approach using RAG and NMISS to reduce hallucinations in Italian healthcare LLM chatbots, improving answer accuracy and contextual relevance.

Contribution

It introduces NMISS for better hallucination detection and demonstrates how RAG and NMISS together enhance LLM reliability in healthcare applications.

Findings

01

GPT-4 outperforms other models in accuracy

02

NMISS benefits mid-tier models significantly

03

Combined approach reduces hallucination occurrences

Abstract

I combine detection and mitigation techniques to addresses hallucinations in Large Language Models (LLMs). Mitigation is achieved in a question-answering Retrieval-Augmented Generation (RAG) framework while detection is obtained by introducing the Negative Missing Information Scoring System (NMISS), which accounts for contextual relevance in responses. While RAG mitigates hallucinations by grounding answers in external data, NMISS refines the evaluation by identifying cases where traditional metrics incorrectly flag contextually accurate responses as hallucinations. I use Italian health news articles as context to evaluate LLM performance. Results show that Gemma2 and GPT-4 outperform the other models, with GPT-4 producing answers closely aligned with reference responses. Mid-tier models, such as Llama2, Llama3, and Mistral benefit significantly from NMISS, highlighting their ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · COVID-19 diagnosis using AI · Digital Mental Health Interventions

MethodsAttention Dropout · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Softmax · Multi-Head Attention · Weight Decay · Byte Pair Encoding · WordPiece · Linear Warmup With Linear Decay