UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue, Fei Mi, Qi Zhu, Hongru Wang, Rui Wang, Sheng Wang, Erxin Yu, Xuming Hu, and Kam-Fai Wong

TL;DR
UAlign introduces a novel framework that uses uncertainty estimations to improve large language models' ability to accurately express and align with factual knowledge, enhancing reliability and generalizability.
Contribution
The paper proposes a new method that leverages uncertainty estimations as input features for factuality alignment in LLMs, using a reward model and PPO training.
Findings
Significantly improves LLMs' factual answering accuracy
Enhances model confidence and refusal of unknown questions
Demonstrates robustness across in-domain and out-of-domain tasks
Abstract
Despite demonstrating impressive capabilities, Large Language Models (LLMs) still often struggle to accurately express the factual knowledge they possess, especially in cases where the LLMs' knowledge boundaries are ambiguous. To improve LLMs' factual expressions, we propose the UAlign framework, which leverages Uncertainty estimations to represent knowledge boundaries, and then explicitly incorporates these representations as input features into prompts for LLMs to Align with factual knowledge. First, we prepare the dataset on knowledge question-answering (QA) samples by calculating two uncertainty estimations, including confidence score and semantic entropy, to represent the knowledge boundaries for LLMs. Subsequently, using the prepared dataset, we train a reward model that incorporates uncertainty estimations and then employ the Proximal Policy Optimization (PPO) algorithm for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsALIGN
