Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment

Tiejin Chen; Xiaoou Liu; Vishnu Nandam; Kuan-Ru Liou; Hua Wei

arXiv:2601.17329·cs.LG·January 27, 2026

Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment

Tiejin Chen, Xiaoou Liu, Vishnu Nandam, Kuan-Ru Liou, Hua Wei

PDF

Open Access 1 Video

TL;DR

This paper introduces Conformal Feedback Alignment (CFA), a new framework that assesses answer-level reliability using conformal prediction to improve the robustness and efficiency of large language model alignment.

Contribution

CFA is the first method to incorporate answer-level reliability into preference-based alignment, grounded in conformal prediction guarantees, enhancing robustness and data efficiency.

Findings

01

CFA improves alignment robustness across datasets.

02

CFA enhances data efficiency in training.

03

Answer-side uncertainty modeling complements preference weighting.

Abstract

Preference-based alignment like Reinforcement Learning from Human Feedback (RLHF) learns from pairwise preferences, yet the labels are often noisy and inconsistent. Existing uncertainty-aware approaches weight preferences, but ignore a more fundamental factor: the reliability of the \emph{answers} being compared. To address the problem, we propose Conformal Feedback Alignment (CFA), a framework that grounds preference weighting in the statistical guarantees of Conformal Prediction (CP). CFA quantifies answer-level reliability by constructing conformal prediction sets with controllable coverage and aggregates these reliabilities into principled weights for both DPO- and PPO-style training. Experiments across different datasets show that CFA improves alignment robustness and data efficiency, highlighting that modeling \emph{answer-side} uncertainty complements preference-level weighting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Conformal Feedback Alignment: Quantifying Answer-Level Reliability for Robust LLM Alignment· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Mobile Crowdsensing and Crowdsourcing · Topic Modeling