AMPO: Active Multi-Preference Optimization for Self-play Preference Selection

Taneesh Gupta; Rahul Madhavan; Xuchao Zhang; Chetan Bansal; Saravan Rajmohan

arXiv:2502.18293·cs.LG·June 10, 2025

AMPO: Active Multi-Preference Optimization for Self-play Preference Selection

Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan

PDF

Open Access

TL;DR

AMPO introduces an active subset selection method for multi-preference optimization in language models, improving alignment by efficiently identifying diverse and informative responses for training, leading to state-of-the-art results.

Contribution

The paper presents a novel active subset selection technique for multi-preference optimization, enhancing language model alignment with theoretical guarantees and empirical improvements.

Findings

01

Achieves state-of-the-art results on AlpacaEval with Llama 8B and Mistral 7B.

02

Provides theoretical guarantees for reward maximization.

03

Effectively identifies diverse response modes for robust training.

Abstract

Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses, thereby enabling richer training signals for large language models. During self-play alignment, these models often produce numerous candidate answers per query, rendering it computationally infeasible to include all responses in the training objective. In this work, we propose $Active Multi-Preference Optimization$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. Specifically, we score and embed large candidate pools of responses and then select a small, yet informative, subset that covers reward extremes and distinct semantic clusters for preference optimization. Our contrastive training scheme is capable of identifying not only the best and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Constraint Satisfaction and Optimization

MethodsLLaMA