Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go, Tomasz Korbak, Germ\'an Kruszewski, Jos Rozen, Nahyeon, Ryu, Marc Dymetman

TL;DR
This paper introduces f-DPG, a flexible framework for aligning language models with preferences using any f-divergence, unifying existing methods and demonstrating that divergence choice impacts alignment and diversity trade-offs.
Contribution
Proposes f-DPG, a novel approach that generalizes existing alignment methods by allowing any f-divergence to approximate target distributions, improving model alignment performance.
Findings
Jensen-Shannon divergence balances alignment and diversity effectively.
Different divergences offer distinct trade-offs in model alignment.
Jensen-Shannon divergence outperforms forward KL in experiments.
Abstract
Aligning language models with preferences can be posed as approximating a target distribution representing some desired behavior. Existing approaches differ both in the functional form of the target distribution and the algorithm used to approximate it. For instance, Reinforcement Learning from Human Feedback (RLHF) corresponds to minimizing a reverse KL from an implicit target distribution arising from a KL penalty in the objective. On the other hand, Generative Distributional Control (GDC) has an explicit target distribution and minimizes a forward KL from it using the Distributional Policy Gradient (DPG) algorithm. In this paper, we propose a new approach, f-DPG, which allows the use of any f-divergence to approximate any target distribution that can be evaluated. f-DPG unifies both frameworks (RLHF, GDC) and the approximation methods (DPG, RL with KL penalties). We show the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Reinforcement Learning in Robotics · Natural Language Processing Techniques
