FAITH: Factuality Alignment through Integrating Trustworthiness and Honestness
Xiaoning Dong, Chengyan Wu, Yajie Wen, Yu Chen, Yun Xue, Jing Zhang, Wei Xu, Bolei Ma

TL;DR
FAITH is a post-training framework that improves LLM factuality by integrating trustworthiness and honestness signals, external knowledge retrieval, and a reward-based fine-tuning process.
Contribution
It introduces a novel natural-language based approach to align LLMs' internal trust and honesty with external knowledge, enhancing factual accuracy.
Findings
FAITH improves factual accuracy on four benchmarks.
The retrieval module increases consistency between internal and external knowledge.
Reward-based fine-tuning enhances truthfulness of LLM outputs.
Abstract
Large Language Models (LLMs) can generate factually inaccurate content even if they have corresponding knowledge, which critically undermines their reliability. Existing approaches attempt to mitigate this by incorporating uncertainty in QA prompt during training, but these numerical scores lack the semantic richness for LLM to properly understand its internal states of trustworthiness and honestness, leading to insufficient factuality alignment. We introduce FAITH (Factuality Alignment through Integrating Trustworthiness and Honestness), a post-training framework for factuality alignment that integrates natural-language uncertainty signals with external knowledge. Specifically, we augment training datasets by computing confidence scores and semantic entropy from LLM outputs and mapping them into a knowledge state quadrant that describes the model's internal knowledge possession…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
