TL;DR
The paper introduces GeoDe, a geometric denoising method that improves large language models' ability to recognize their knowledge boundaries, reducing hallucinations and increasing truthfulness.
Contribution
It proposes a novel latent space geometric denoising framework for abstention fine-tuning, effectively filtering ambiguous samples near decision boundaries.
Findings
GeoDe improves truthfulness across multiple models and datasets.
The method enhances out-of-distribution generalization.
GeoDe reduces hallucination rates significantly.
Abstract
Large language models (LLMs) often exhibit hallucinations due to their inability to accurately perceive their own knowledge boundaries. Existing abstention fine-tuning methods typically partition datasets directly based on response accuracy, causing models to suffer from severe label noise near the decision boundaries and consequently exhibit high rates of abstentions or hallucinations. This paper adopts a latent space representation perspective, revealing a "gray zone" near the decision hyperplane where internal belief ambiguity constitutes the core performance bottleneck. Based on this insight, we propose the **GeoDe** (**Geo**metric **De**noising) framework for abstention fine-tuning. This method constructs a truth hyperplane using linear probes and performs "geometric denoising" by employing geometric distance as a confidence signal for abstention decisions. This approach filters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
