Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework
Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo

TL;DR
This paper introduces a new preference learning framework that aligns policies proportionally with true population preferences, addressing biases and manipulation issues in existing methods, and is validated on recommendation and language model tasks.
Contribution
The paper develops an axiomatic, population-proportional preference aggregation framework that infers evaluator distributions and constructs aligned policies, incorporating novel axioms and a soft-max relaxation method.
Findings
Effective in aligning policies with true population preferences
Scalable to large language models and recommendation tasks
Reduces bias and manipulation in preference aggregation
Abstract
Conventional preference learning methods often prioritize opinions held more widely when aggregating preferences from multiple evaluators. This may result in policies that are biased in favor of some types of opinions or groups and susceptible to strategic manipulation. To address this issue, we develop a novel preference learning framework capable of aligning aggregate opinions and policies proportionally with the true population distribution of evaluator preferences. Grounded in social choice theory, our approach infers the feasible set of evaluator population distributions directly from pairwise comparison data. Using these estimates, the algorithm constructs a policy that satisfies foundational axioms from social choice theory, namely monotonicity and Pareto efficiency, as well as our newly-introduced axioms of population-proportional alignment and population-bounded manipulability.…
Peer Reviews
Decision·ICLR 2026 Poster
* This paper studies a fundamental and important problem, reveals the shortcomings of existing algorithms, and proposes desired method through solid theoretical analyses, which is insightful and meaningful. * This paper has clear structure and writing, making it easy to understand. I enjoyed reading the paper.
I found no obvious flaw of this paper. My only concern is that this paper mostly studies social choice theory in the context of preference learning, and therefore may not be interest of most audiences in the ML community.
The main strength is the attempt to do dsitributional/proportional alignment based on preference structure instead of based on pre-assigned group labels.
This is a weak paper as is. The main “contributions” rely on a narrow, first-choice notion of proportionality and a misnamed “strategyproofness” that is really an over-representation cap. Core claims are overstated, proofs are straightforward given the simplified setup, and the paper overlooks major, directly relevant social-choice literature. 1. If “alignment” means matching users’ top choices, supervised fine-tuning with annotators providing preferred options (either in free form text or by c
* The authors tackle a key problem in alignment; under representation of minority class preferences. * I did not rigorously check the math, but I was able to follow the equations and to my eye they seemed correct. * The tradeoffs of the method are clear, the authors also include a parameter to tradeoff between PPA (a metric they introduce) and Condorcet consistency (more on that in weaknesses) * Ultimately this is a well-written, well-justified paper tackling a key problem in alignment. The s
* I felt that the experiments section had 3 major deficits: ** The movielens results seems somewhat underdeveloped. In many ways this seems to be a fairly ideal case, with potentially large disagreement. The results provided seem to back this, but they are provided briefly in sentence form, and not included with tables, more specific intermediate values, making it hard to get a sense for how robust or reliable those values are. ** The baselines seem relatively reasonable, but given the framing
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic Policies and Impacts
