MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations
Kensuke Mitsuzawa, Damien Garreau

TL;DR
MMD-Flagger is a novel method that detects hallucinations in large language models by analyzing the distributional differences in outputs generated at different temperature settings, showing promising results in translation and summarization tasks.
Contribution
It introduces a new hallucination detection technique using Maximum Mean Discrepancy, leveraging output distribution shifts across temperature variations.
Findings
Effective in detecting hallucinations in LLM outputs
Competitive performance on translation and summarization datasets
Utilizes distribution shape trajectory for detection
Abstract
Large language models (LLMs) have become pervasive in our everyday life. Yet, a fundamental obstacle prevents their use in many critical applications: their propensity to generate fluent, human-quality content that is not grounded in reality. The detection of such hallucinations is thus of the highest importance. In this work, we propose a new method to flag hallucinated content: MMD-Flagger. It relies on Maximum Mean Discrepancy (MMD), a non-parametric distance between distributions. On a high-level perspective, MMD-Flagger tracks the MMD between the output to inspect and counterparts generated with various temperature parameters. We show empirically that inspecting the shape of this trajectory is sufficient to detect most hallucinations. This novel method is benchmarked on machine translation and summarization datasets, on which it exhibits competitive performance relative to natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraditional Chinese Medicine Studies · Mental Health Research Topics
