When Answers Stray from Questions: Hallucination Detection via Question-Answer Orthogonal Decomposition
Siyang Yao, Erhu Feng, Yubin Xia

TL;DR
This paper introduces QAOD, a single-pass method for hallucination detection in large language models that improves accuracy and robustness by orthogonalizing answer representations relative to questions.
Contribution
QAOD is a novel framework that projects answer representations away from question-aligned directions, enhancing hallucination detection and domain transfer in LLMs.
Findings
QAOD achieves the best in-domain AUROC across multiple datasets.
Orthogonal-only probe surpasses white-box baselines in out-of-domain transfer.
QAOD reduces detection cost to under 25% of generation cost.
Abstract
Hallucination detection in large language models (LLMs) requires balancing accu racy, efficiency, and robustness to distribution shift. Black-box consistency methods are effective but demand repeated inference; single-pass white-box probes are effi cient yet treat answer representations in isolation, often degrading sharply under domain shift. We propose QAOD (Question-Answer Orthogonal Decomposition), a single-pass framework that projects away the question-aligned direction from the answer representation to obtain a question-orthogonal component that suppresses domain-conditioned variation. To identify informative signals, QAOD further selects layers via diversity-penalized Fisher scoring and discriminative neurons via Fisher importance. To address both in-domain detection and cross-domain generalization, we design two complementary probing strategies: pairing the or thogonal component…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
