TL;DR
This paper introduces PCNET, a probabilistic circuit-based method for detecting and correcting hallucinations in large language models by analyzing the latent space geometry, improving factual accuracy without degrading correct outputs.
Contribution
It presents a novel approach using probabilistic circuits to identify hallucinations as geometric anomalies, enabling targeted interventions during decoding.
Findings
Achieves up to 99% AUROC in hallucination detection across multiple benchmarks.
Outperforms state-of-the-art baselines on TruthfulQA with higher factuality scores.
Reduces mean corruption rate to 53.7%, preserving 79.3% of correct generations.
Abstract
One of the most critical challenges in Large Language Models is their tendency to hallucinate, i.e., produce factually incorrect responses. Existing approaches show promising results in terms of hallucination correction, but still suffer from a main limitation: they apply corrections indiscriminately to every token, corrupting also the originally correct generations. To overcome this drawback, we propose PCNET, a Probabilistic Circuit trained as a tractable density estimator over the LLM residual stream. The method detects hallucinations as geometric anomalies on the factual manifold, which is done via exact Negative Log-Likelihood computation, hence without the need for sampling, external verifiers, or weight modifications, as in existing techniques. To demonstrate its effectiveness, we exploit PCNET as a dynamic gate that distinguishes hallucinated from factual hidden states at each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
