Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin,, Linfeng Song, Haitao Mi, Helen Meng

TL;DR
This paper introduces a self-alignment method that uses an LLM's self-evaluation to improve its factual accuracy, reducing hallucinations without relying on external annotations.
Contribution
It proposes Self-Eval and Self-Knowledge Tuning to enable LLMs to self-assess and improve factuality through internal signals, advancing factual accuracy in language models.
Findings
Significant reduction in hallucinations on TruthfulQA and BioGEN tasks.
Enhanced confidence calibration in LLMs after self-alignment.
Improved factual accuracy over baseline Llama models.
Abstract
Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM's self-evaluation ability by improving the model's confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBlockchain Technology Applications and Security
