DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

Mingxi Zou; Jiaxiang Chen; Junfan Li; Langzhang Liang; Qifan Wang; Xu Yinghui; Zenglin Xu

arXiv:2603.08145·cs.LG·May 19, 2026

DARC: Disagreement-Aware Alignment via Risk-Constrained Decoding

Mingxi Zou, Jiaxiang Chen, Junfan Li, Langzhang Liang, Qifan Wang, Xu Yinghui, Zenglin Xu

PDF

TL;DR

DARC is a novel inference-time method that improves alignment robustness by explicitly managing disagreement and risk during response selection, without retraining, using a KL-robust optimization approach.

Contribution

It introduces a retraining-free, risk-sensitive decoding method that explicitly accounts for disagreement, enhancing alignment robustness in language models.

Findings

01

DARC reduces disagreement and tail risk in responses.

02

Maintains competitive average quality under noisy feedback.

03

Provides explicit risk controls during inference.

Abstract

Preference-based alignment methods (e.g., RLHF, DPO) typically optimize a single scalar objective, implicitly averaging over heterogeneous human preferences. In practice, systematic annotator and user-group disagreement makes mean-reward maximization brittle and susceptible to proxy over-optimization. We propose **Disagreement-Aware Alignment via Risk-Constrained Decoding (DARC)**, a retraining-free inference-time method that frames response selection as distributionally robust, risk-sensitive decision making. Given multiple preference samples or scalable disagreement proxies, DARC reranks candidates by maximizing a *KL-robust (entropic)* satisfaction objective, and provides simple deployment controls that cap or penalize the corresponding entropic risk premium relative to the mean, enabling explicit risk budgets without retraining. We provide theoretical characterization linking this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Data Management and Algorithms · Recommender Systems and Techniques