MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Jingyan Shen; Jiarui Yao; Rui Yang; Yifan Sun; Feng Luo; Rui Pan; Tong Zhang; Han Zhao

arXiv:2505.24846·cs.AI·September 24, 2025

MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

Jingyan Shen, Jiarui Yao, Rui Yang, Yifan Sun, Feng Luo, Rui Pan, Tong Zhang, Han Zhao

PDF

Open Access

TL;DR

MiCRo is a two-stage framework that improves personalized preference learning for large language models by modeling diverse human preferences with mixture models and context-aware routing, without needing detailed annotations.

Contribution

MiCRo introduces a novel mixture modeling and context-aware routing approach for scalable, personalized preference learning from large-scale binary preference data.

Findings

01

Effectively captures diverse human preferences.

02

Significantly improves downstream personalization.

03

Adapts dynamically with minimal supervision.

Abstract

Reward modeling is a key step in building safe foundation models when applying reinforcement learning from human feedback (RLHF) to align Large Language Models (LLMs). However, reward modeling based on the Bradley-Terry (BT) model assumes a global reward function, failing to capture the inherently diverse and heterogeneous human preferences. Hence, such oversimplification limits LLMs from supporting personalization and pluralistic alignment. Theoretically, we show that when human preferences follow a mixture distribution of diverse subgroups, a single BT model has an irreducible error. While existing solutions, such as multi-objective learning with fine-grained annotations, help address this issue, they are costly and constrained by predefined attributes, failing to fully capture the richness of human values. In this work, we introduce MiCRo, a two-stage framework that enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Recommender Systems and Techniques · Image Retrieval and Classification Techniques

MethodsALIGN