AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Qi Liu; Jingqing Ruan; Hao Li; Haodong Zhao; Desheng Wang; Jiansong Chen; Wan Guanglu; Xunliang Cai; Zhi Zheng; Tong Xu

arXiv:2506.07165·cs.LG·June 10, 2025

AMoPO: Adaptive Multi-objective Preference Optimization without Reward Models and Reference Models

Qi Liu, Jingqing Ruan, Hao Li, Haodong Zhao, Desheng Wang, Jiansong Chen, Wan Guanglu, Xunliang Cai, Zhi Zheng, Tong Xu

PDF

Open Access

TL;DR

AMoPO introduces a novel multi-objective optimization framework for aligning large language models with diverse preferences without relying on reward or reference models, using adaptive weighting based on a Gaussian distribution.

Contribution

The paper presents AMoPO, a new adaptive multi-objective preference optimization method that dynamically balances preference dimensions without auxiliary models, improving alignment efficiency and scalability.

Findings

01

Outperforms state-of-the-art baselines by 28.5%

02

Effective across models of 7B, 14B, and 32B parameters

03

Demonstrates strong adaptability and preference dimension management

Abstract

Existing multi-objective preference alignment methods for large language models (LLMs) face limitations: (1) the inability to effectively balance various preference dimensions, and (2) reliance on auxiliary reward/reference models introduces computational complexity. To address these challenges, we propose Adaptive Multi-objective Preference Optimization (AMoPO), a novel framework that achieves dynamic balance across preference dimensions. By introducing the multi-objective optimization paradigm to use the dimension-aware generation metrics as implicit rewards, AMoPO aligns LLMs with diverse preferences without additional reward models or reference models. We introduce an adaptive weight assignment mechanism that models the generation space as a Gaussian distribution, allowing dynamic prioritization of preference dimensions. Empirical results demonstrate that AMoPO outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification