MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Guangchen Lan; Sipeng Zhang; Tianle Wang; Yuwei Zhang; Daoan Zhang; Xinpeng Wei; Xiaoman Pan; Hongming Zhang; Dong-Jun Han; Christopher G. Brinton

arXiv:2507.21183·cs.LG·May 11, 2026

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Guangchen Lan, Sipeng Zhang, Tianle Wang, Yuwei Zhang, Daoan Zhang, Xinpeng Wei, Xiaoman Pan, Hongming Zhang, Dong-Jun Han, Christopher G. Brinton

PDF

TL;DR

MaPPO is a novel preference optimization method that incorporates prior reward knowledge into the learning process, improving alignment of large language models with human preferences.

Contribution

MaPPO generalizes existing preference learning methods by integrating prior reward estimates into a maximum a posteriori framework, enhancing alignment without extra hyperparameters.

Findings

01

MaPPO consistently improves alignment performance across multiple benchmarks.

02

It supports both offline and online preference optimization settings.

03

MaPPO can be used as a plugin for existing DPO variants, yielding better results.

Abstract

As the era of large language models (LLMs) unfolds, Preference Optimization (PO) methods have become a central approach to aligning LLMs with human preferences and improving performance. We propose Maximum a Posteriori Preference Optimization (MaPPO), a methodology for learning from preferences that explicitly incorporates prior reward knowledge into the optimization objective. Building on the paradigm employed by Direct Preference Optimization (DPO) and its variants of treating preference learning as a Maximum Likelihood Estimation (MLE) problem, MaPPO integrates prior reward estimates into a principled Maximum a Posteriori (MaP) objective. This not only generalizes DPO and its variants, but also enhances alignment by mitigating the oversimplified binary classification of responses. Additionally, MaPPO introduces no additional hyperparameters, and supports preference optimization in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.