PIPA: Preference Alignment as Prior-Informed Statistical Estimation

Junbo Li; Zhangyang Wang; Qiang Liu

arXiv:2502.05773·cs.LG·July 28, 2025

PIPA: Preference Alignment as Prior-Informed Statistical Estimation

Junbo Li, Zhangyang Wang, Qiang Liu

PDF

Open Access

TL;DR

PIPA introduces a unified probabilistic framework for offline preference alignment in language models, improving performance on benchmarks without extra training costs by integrating prior information.

Contribution

It formulates preference alignment as a prior-informed MLE problem, unifying existing algorithms and enabling new variations with enhanced performance.

Findings

01

Achieves 3-10% performance improvements on GSM8K and MATH benchmarks.

02

Unifies existing offline preference algorithms under a probabilistic framework.

03

Enhances performance without additional training or computational costs.

Abstract

Offline preference alignment for language models such as Direct Preference Optimization (DPO) is favored for its effectiveness and simplicity, eliminating the need for costly reinforcement learning. Various offline algorithms have been developed for different data settings, yet they lack a unified understanding. In this study, we introduce Pior-Informed Preference Alignment (PIPA), a unified, RL-free probabilistic framework that formulates language model preference alignment as a Maximum Likelihood Estimation (MLE) problem with prior constraints. This method effectively accommodates both paired and unpaired data, as well as answer and step-level annotations. We illustrate that DPO and KTO are special cases with different prior constraints within our framework. By integrating different types of prior information, we developed two variations of PIPA: PIPA-M and PIPA-N. Both algorithms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Data Management and Algorithms · Bayesian Modeling and Causal Inference