Rethinking Diverse Human Preference Learning through Principal Component Analysis

Feng Luo; Rui Yang; Hao Sun; Chunyuan Deng; Jiarui Yao; Jingyan Shen; Huan Zhang; Hanjie Chen

arXiv:2502.13131·cs.AI·June 12, 2025

Rethinking Diverse Human Preference Learning through Principal Component Analysis

Feng Luo, Rui Yang, Hao Sun, Chunyuan Deng, Jiarui Yao, Jingyan Shen, Huan Zhang, Hanjie Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Decomposed Reward Models (DRMs), a novel PCA-based method to extract and interpret diverse human preferences from binary comparisons, enabling scalable, personalized, and interpretable alignment of language models.

Contribution

The paper presents DRMs, a new PCA-based approach that captures human preference diversity without fine-grained data, enhancing model interpretability and personalization.

Findings

01

DRMs effectively identify meaningful preference dimensions like helpfulness and safety.

02

DRMs can adapt to new users without additional training.

03

The approach offers an interpretable alternative to traditional reward models.

Abstract

Understanding human preferences is crucial for improving foundation models and building personalized AI systems. However, preferences are inherently diverse and complex, making it difficult for traditional reward models to capture their full range. While fine-grained preference data can help, collecting it is expensive and hard to scale. In this paper, we introduce Decomposed Reward Models (DRMs), a novel approach that extracts diverse human preferences from binary comparisons without requiring fine-grained annotations. Our key insight is to represent human preferences as vectors and analyze them using Principal Component Analysis (PCA). By constructing a dataset of embedding differences between preferred and rejected responses, DRMs identify orthogonal basis vectors that capture distinct aspects of preference. These decomposed rewards can be flexibly combined to align with different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

holarissun/rewardmodelingbeyondbradleyterry
pytorch

Videos

Rethinking Diverse Human Preference Learning through Principal Component Analysis· underline

Taxonomy

TopicsColor perception and design

MethodsALIGN