Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang, Chengdong Ma, Qizhi Chen, Linjian Meng, Yang Han,, Jiancong Xiao, Zhaowei Zhang, Jing Huo, Weijie J. Su, Yaodong Yang

TL;DR
This paper introduces Magnetic Preference Optimization (MPO), a new method that guarantees last-iterate convergence to the Nash equilibrium in preference-based games, improving language model alignment with theoretical guarantees and practical efficiency.
Contribution
MPO extends Magnetic Mirror Descent to preference optimization, achieving last-iterate convergence and linear rates, addressing limitations of existing methods in LLM fine-tuning.
Findings
MPO achieves last-iterate convergence to the NE.
MPO demonstrates linear convergence rate.
Empirical results show improved LLM performance.
Abstract
Self-play methods have demonstrated remarkable success in enhancing model capabilities across various domains. In the context of Reinforcement Learning from Human Feedback (RLHF), self-play not only boosts Large Language Model (LLM) performance but also overcomes the limitations of traditional Bradley-Terry (BT) model assumptions by finding the Nash equilibrium (NE) of a preference-based, two-player constant-sum game. However, existing methods either guarantee only average-iterate convergence, incurring high storage and inference costs, or converge to the NE of a regularized game, failing to accurately reflect true human preferences. In this paper, we introduce Magnetic Preference Optimization (MPO), a novel approach capable of achieving last-iterate convergence to the NE of the original game, effectively overcoming the limitations of existing methods. Building upon Magnetic Mirror…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
