No Preference Left Behind: Group Distributional Preference Optimization

Binwei Yao; Zefan Cai; Yun-Shiuan Chuang; Shanglin Yang; Ming Jiang; Diyi Yang; Junjie Hu

arXiv:2412.20299·cs.CL·May 14, 2025

No Preference Left Behind: Group Distributional Preference Optimization

Binwei Yao, Zefan Cai, Yun-Shiuan Chuang, Shanglin Yang, Ming Jiang, Diyi Yang, Junjie Hu

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces GDPO, a new method for aligning language models with the diverse preferences within a group by modeling and incorporating belief distributions, improving over existing methods like DPO.

Contribution

GDPO is a novel framework that aligns language models with group preference distributions by modeling beliefs, addressing limitations of existing alignment methods.

Findings

01

GDPO reduces the alignment gap with group preferences during training.

02

GDPO outperforms existing methods in aligning with distributional preferences.

03

Experiments show GDPO effectively captures diverse opinions in synthetic and real-world datasets.

Abstract

Preferences within a group of people are not uniform but follow a distribution. While existing alignment methods like Direct Preference Optimization (DPO) attempt to steer models to reflect human preferences, they struggle to capture the distributional pluralistic preferences within a group. These methods often skew toward dominant preferences, overlooking the diversity of opinions, especially when conflicting preferences arise. To address this issue, we propose Group Distributional Preference Optimization (GDPO), a novel framework that aligns language models with the distribution of preferences within a group by incorporating the concept of beliefs that shape individual preferences. GDPO calibrates a language model using statistical estimation of the group's belief distribution and aligns the model with belief-conditioned preferences, offering a more inclusive alignment framework than…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 3Confidence 3

Strengths

It is important to note the minority preference in current LLMs, since the LLMs tend to response with dominant preferences with in majority. The proposed method is conceptually simple and easy to implement based on the details. The paper is well-presented and easy to follow.

Weaknesses

While the motivation to note the minority preference is crucial, the proposed GDPO might not sufficiently fulfill the motivation. 1. The "belief" distribution is predefined, which makes it hard to take into account a wide range of preferences. 2. At the inference time, a "belief" is selected first. The selected "belief" could also overlook the preference of minority. 3. In the experiment of movie review, the "belief" is implemented with rating scores. However, the rating routain of different

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper introduces a novel group-wise perspective in preference optimization, which significantly enhances the effectiveness and practicality of fine-tuning methods compared to existing DPO approaches that often skew towards dominant preferences. 2. The writing is direct and concise, making the paper easy to read and understand. The authors effectively convey complex ideas and methodologies. 3. The experimental design is precise and well-aligned with the core objectives outlined in the in

Weaknesses

- **Belief Set Design**: The need to design specific belief sets for each dataset based on its domain characteristics may limit the scalability and generalizability of the proposed Group Distributional Preference Optimization framework. This requirement adds an additional layer of complexity and could be a barrier to broader adoption. - **Training Efficiency**: The training process for GDPO involves calculating the calibration loss $l_{\text{cal.}}$ for each belief in the set, leading to a sig

Reviewer 03Rating 6Confidence 3

Strengths

1. This work studies on an interesting problem. 2. The proposed method is simple and easily implemented. 3. Extensive experiments on bot synthetic and real-world datasets are conducted to validate the effectiveness of the proposed method. 4. The paper is well-writing.

Weaknesses

1. The technical contribution appears limited. The proposed method is a simple extension of Distributional Preference Optimization (DPO), and the authors do not provide substantial insights to reveal the intricate properties of the proposed method. I would suggest the authors conduct more analyses to demonstrate why the proposed strategy is crucial, potentially even a "game-changer" in this field. Theoretical analyses would be particularly beneficial. 2. Another concern pertains to the applica

Code & Models

Repositories

BigBinnie/GDPO
pytorchOfficial

Videos

No Preference Left Behind: Group Distributional Preference Optimization· slideslive

Taxonomy

TopicsDecision-Making and Behavioral Economics · Economic and Environmental Valuation

MethodsDirect Preference Optimization · ALIGN