Magnetic Preference Optimization: Achieving Last-iterate Convergence for   Language Model Alignment

Mingzhi Wang; Chengdong Ma; Qizhi Chen; Linjian Meng; Yang Han,; Jiancong Xiao; Zhaowei Zhang; Jing Huo; Weijie J. Su; Yaodong Yang

arXiv:2410.16714·cs.CL·April 22, 2025

Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment

Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han,, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su, Yaodong Yang

PDF

Open Access

TL;DR

This paper introduces Magnetic Preference Optimization (MPO), a new method that guarantees last-iterate convergence to the Nash equilibrium in preference-based games, improving language model alignment with theoretical guarantees and practical efficiency.

Contribution

MPO extends Magnetic Mirror Descent to preference optimization, achieving last-iterate convergence and linear rates, addressing limitations of existing methods in LLM fine-tuning.

Findings

01

MPO achieves last-iterate convergence to the NE.

02

MPO demonstrates linear convergence rate.

03

Empirical results show improved LLM performance.

Abstract

Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of Reinforcement Learning from Human Feedback (RLHF), self-play not only boosts Large Language Model (LLM) performance but also overcomes the limitations of traditional Bradley-Terry (BT) model assumptions by finding the Nash equilibrium (NE) of a preference-based, two-player constant-sum game. However, existing methods either guarantee only average-iterate convergence, incurring high storage and inference costs, or converge to the NE of a regularized game, failing to accurately reflect true human preferences. In this paper, we introduce Magnetic Preference Optimization (MPO), a novel approach capable of achieving last-iterate convergence to the NE of the original game, effectively overcoming the limitations of existing methods. Building upon Magnetic Mirror…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems