Atomic Consistency Preference Optimization for Long-Form Question Answering

Jingfeng Chen; Raghuveer Thirukovalluru; Junlin Wang; Kaiwei Luo; Bhuwan Dhingra

arXiv:2505.09039·cs.CL·November 11, 2025

Atomic Consistency Preference Optimization for Long-Form Question Answering

Jingfeng Chen, Raghuveer Thirukovalluru, Junlin Wang, Kaiwei Luo, Bhuwan Dhingra

PDF

Open Access 1 Repo

TL;DR

This paper introduces ACPO, a self-supervised method that improves factual accuracy in large language models for long-form question answering by leveraging atomic consistency signals, eliminating the need for external supervision.

Contribution

ACPO is a novel self-supervised preference-tuning approach that enhances factual accuracy without external knowledge bases or models, outperforming supervised alignment methods.

Findings

01

ACPO outperforms supervised alignment baseline by 1.95 points on Phi-3 and Llama3 datasets.

02

ACPO effectively improves factual reliability in long-form QA tasks.

03

The method leverages atomic consistency signals for data quality assessment.

Abstract

Large Language Models (LLMs) often produce factoid hallucinations - plausible yet incorrect answers. A common mitigation strategy is model alignment, which improves factual accuracy by training on curated (factual, non-factual) pairs. However, this approach often relies on a stronger model (e.g., GPT-4) or an external knowledge base to assess factual correctness that may not always be accessible. Addressing this, we propose Atomic Consistency Preference Optimization (ACPO), a self-supervised preference-tuning method that enhances factual accuracy without external supervision. ACPO leverages atomic consistency signals (i.e., the agreement of individual facts across multiple stochastic responses) to identify high- and low-quality data pairs for model alignment. Despite being fully self-supervised, ACPO outperforms the strong supervised alignment baseline by 1.95 points averaged across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jingfengsteven/acpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Robotics and Automated Systems · AI-based Problem Solving and Planning

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Attention Dropout · Softmax · Linear Layer · Weight Decay · Adam · Residual Connection · Multi-Head Attention