Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

Kihyun Kim; Jiawei Zhang; Asuman Ozdaglar; Pablo A. Parrilo

arXiv:2506.05619·cs.AI·March 3, 2026

Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new preference learning framework that aligns policies proportionally with true population preferences, addressing biases and manipulation issues in existing methods, and is validated on recommendation and language model tasks.

Contribution

The paper develops an axiomatic, population-proportional preference aggregation framework that infers evaluator distributions and constructs aligned policies, incorporating novel axioms and a soft-max relaxation method.

Findings

01

Effective in aligning policies with true population preferences

02

Scalable to large language models and recommendation tasks

03

Reduces bias and manipulation in preference aggregation

Abstract

Conventional preference learning methods often prioritize opinions held more widely when aggregating preferences from multiple evaluators. This may result in policies that are biased in favor of some types of opinions or groups and susceptible to strategic manipulation. To address this issue, we develop a novel preference learning framework capable of aligning aggregate opinions and policies proportionally with the true population distribution of evaluator preferences. Grounded in social choice theory, our approach infers the feasible set of evaluator population distributions directly from pairwise comparison data. Using these estimates, the algorithm constructs a policy that satisfies foundational axioms from social choice theory, namely monotonicity and Pareto efficiency, as well as our newly-introduced axioms of population-proportional alignment and population-bounded manipulability.…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

* This paper studies a fundamental and important problem, reveals the shortcomings of existing algorithms, and proposes desired method through solid theoretical analyses, which is insightful and meaningful. * This paper has clear structure and writing, making it easy to understand. I enjoyed reading the paper.

Weaknesses

I found no obvious flaw of this paper. My only concern is that this paper mostly studies social choice theory in the context of preference learning, and therefore may not be interest of most audiences in the ML community.

Reviewer 02Rating 2Confidence 5

Strengths

The main strength is the attempt to do dsitributional/proportional alignment based on preference structure instead of based on pre-assigned group labels.

Weaknesses

This is a weak paper as is. The main “contributions” rely on a narrow, first-choice notion of proportionality and a misnamed “strategyproofness” that is really an over-representation cap. Core claims are overstated, proofs are straightforward given the simplified setup, and the paper overlooks major, directly relevant social-choice literature. 1. If “alignment” means matching users’ top choices, supervised fine-tuning with annotators providing preferred options (either in free form text or by c

Reviewer 03Rating 8Confidence 3

Strengths

* The authors tackle a key problem in alignment; under representation of minority class preferences. * I did not rigorously check the math, but I was able to follow the equations and to my eye they seemed correct. * The tradeoffs of the method are clear, the authors also include a parameter to tradeoff between PPA (a metric they introduce) and Condorcet consistency (more on that in weaknesses) * Ultimately this is a well-written, well-justified paper tackling a key problem in alignment. The s

Weaknesses

* I felt that the experiments section had 3 major deficits: ** The movielens results seems somewhat underdeveloped. In many ways this seems to be a fairly ideal case, with potentially large disagreement. The results provided seem to back this, but they are provided briefly in sentence form, and not included with tables, more specific intermediate values, making it hard to get a sense for how robust or reliable those values are. ** The baselines seem relatively reasonable, but given the framing

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic Policies and Impacts