Addressing Hallucinations with RAG and NMISS in Italian Healthcare LLM Chatbots
Maria Paola Priola

TL;DR
This paper presents a combined detection and mitigation approach using RAG and NMISS to reduce hallucinations in Italian healthcare LLM chatbots, improving answer accuracy and contextual relevance.
Contribution
It introduces NMISS for better hallucination detection and demonstrates how RAG and NMISS together enhance LLM reliability in healthcare applications.
Findings
GPT-4 outperforms other models in accuracy
NMISS benefits mid-tier models significantly
Combined approach reduces hallucination occurrences
Abstract
I combine detection and mitigation techniques to addresses hallucinations in Large Language Models (LLMs). Mitigation is achieved in a question-answering Retrieval-Augmented Generation (RAG) framework while detection is obtained by introducing the Negative Missing Information Scoring System (NMISS), which accounts for contextual relevance in responses. While RAG mitigates hallucinations by grounding answers in external data, NMISS refines the evaluation by identifying cases where traditional metrics incorrectly flag contextually accurate responses as hallucinations. I use Italian health news articles as context to evaluate LLM performance. Results show that Gemma2 and GPT-4 outperform the other models, with GPT-4 producing answers closely aligned with reference responses. Mid-tier models, such as Llama2, Llama3, and Mistral benefit significantly from NMISS, highlighting their ability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · COVID-19 diagnosis using AI · Digital Mental Health Interventions
MethodsAttention Dropout · Position-Wise Feed-Forward Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Softmax · Multi-Head Attention · Weight Decay · Byte Pair Encoding · WordPiece · Linear Warmup With Linear Decay
