Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses
Khizar Hussain, Bradley A. Malin, Zhijun Yin, Susannah Leigh Rose, Murat Kantarcioglu

TL;DR
This paper presents a framework combining human expertise and LLMs to improve detection of hallucinations and omissions in mental health chatbot responses, enhancing safety and transparency.
Contribution
It introduces a domain-informed feature extraction framework that significantly improves hallucination and omission detection over traditional LLM judges.
Findings
Traditional LLM judges achieve only 52% accuracy in mental health data.
The proposed framework achieves up to 0.849 F1 in hallucination detection.
Combining human expertise with automated features improves reliability in high-stakes settings.
Abstract
As LLM-powered chatbots are increasingly deployed in mental health services, detecting hallucinations and omissions has become critical for user safety. However, state-of-the-art LLM-as-a-judge methods often fail in high-risk healthcare contexts, where subtle errors can have serious consequences. We show that leading LLM judges achieve only 52% accuracy on mental health counseling data, with some hallucination detection approaches exhibiting near-zero recall. We identify the root cause as LLMs' inability to capture nuanced linguistic and therapeutic patterns recognized by domain experts. To address this, we propose a framework that integrates human expertise with LLMs to extract interpretable, domain-informed features across five analytical dimensions: logical consistency, entity verification, factual accuracy, linguistic uncertainty, and professional appropriateness. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
