Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information
Rasul Tutnov, Antoine Grosnit, Haitham Bou-Ammar

TL;DR
This paper introduces a mutual information-based unifying framework for preference optimization in large language models, simplifying understanding of various DPO variants and aiding future alignment research.
Contribution
The paper proposes a new flexible loss function framework that unifies many existing DPO algorithms through specified priors, enhancing interpretability and development of LLM alignment methods.
Findings
Many DPO variants can be derived from the proposed framework
The framework clarifies relationships between different alignment algorithms
Potential for developing more robust and interpretable alignment techniques
Abstract
Post-alignment of large language models (LLMs) is critical in improving their utility, safety, and alignment with human intentions. Direct preference optimisation (DPO) has become one of the most widely used algorithms for achieving this alignment, given its ability to optimise models based on human feedback directly. However, the vast number of DPO variants in the literature has made it increasingly difficult for researchers to navigate and fully grasp the connections between these approaches. This paper introduces a unifying framework inspired by mutual information, which proposes a new loss function with flexible priors. By carefully specifying these priors, we demonstrate that many existing algorithms, such as SimPO, TDPO, SparsePO, and others, can be derived from our framework. This unification offers a clearer and more structured approach, allowing researchers to understand the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Multimodal Machine Learning Applications · Machine Learning and Data Classification
MethodsDirect Preference Optimization
