Atomic Consistency Preference Optimization for Long-Form Question Answering
Jingfeng Chen, Raghuveer Thirukovalluru, Junlin Wang, Kaiwei Luo, Bhuwan Dhingra

TL;DR
This paper introduces ACPO, a self-supervised method that improves factual accuracy in large language models for long-form question answering by leveraging atomic consistency signals, eliminating the need for external supervision.
Contribution
ACPO is a novel self-supervised preference-tuning approach that enhances factual accuracy without external knowledge bases or models, outperforming supervised alignment methods.
Findings
ACPO outperforms supervised alignment baseline by 1.95 points on Phi-3 and Llama3 datasets.
ACPO effectively improves factual reliability in long-form QA tasks.
The method leverages atomic consistency signals for data quality assessment.
Abstract
Large Language Models (LLMs) often produce factoid hallucinations - plausible yet incorrect answers. A common mitigation strategy is model alignment, which improves factual accuracy by training on curated (factual, non-factual) pairs. However, this approach often relies on a stronger model (e.g., GPT-4) or an external knowledge base to assess factual correctness that may not always be accessible. Addressing this, we propose Atomic Consistency Preference Optimization (ACPO), a self-supervised preference-tuning method that enhances factual accuracy without external supervision. ACPO leverages atomic consistency signals (i.e., the agreement of individual facts across multiple stochastic responses) to identify high- and low-quality data pairs for model alignment. Despite being fully self-supervised, ACPO outperforms the strong supervised alignment baseline by 1.95 points averaged across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Robotics and Automated Systems · AI-based Problem Solving and Planning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Attention Dropout · Softmax · Linear Layer · Weight Decay · Adam · Residual Connection · Multi-Head Attention
