Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models

Igor Halperin

arXiv:2508.10192·cs.CL·August 15, 2025

Prompt-Response Semantic Divergence Metrics for Faithfulness Hallucination and Misalignment Detection in Large Language Models

Igor Halperin

PDF

TL;DR

This paper proposes Semantic Divergence Metrics (SDM), a lightweight framework that detects faithfulness hallucinations in large language models by measuring semantic divergence across prompts and responses, improving hallucination detection accuracy.

Contribution

The paper introduces SDM, a novel prompt-aware semantic divergence framework that enhances hallucination detection in LLMs by analyzing response consistency across paraphrased prompts.

Findings

01

SDM effectively detects faithfulness hallucinations in LLMs.

02

The combined metrics accurately classify different response types.

03

Semantic divergence scores correlate with hallucination severity.

Abstract

The proliferation of Large Language Models (LLMs) is challenged by hallucinations, critical failure modes where models generate non-factual, nonsensical or unfaithful text. This paper introduces Semantic Divergence Metrics (SDM), a novel lightweight framework for detecting Faithfulness Hallucinations -- events of severe deviations of LLMs responses from input contexts. We focus on a specific implementation of these LLM errors, {confabulations, defined as responses that are arbitrary and semantically misaligned with the user's query. Existing methods like Semantic Entropy test for arbitrariness by measuring the diversity of answers to a single, fixed prompt. Our SDM framework improves upon this by being more prompt-aware: we test for a deeper form of arbitrariness by measuring response consistency not only across multiple answers but also across multiple, semantically-equivalent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.