Information Theoretic Guarantees For Policy Alignment In Large Language   Models

Youssef Mroueh

arXiv:2406.05883·cs.LG·June 11, 2024

Information Theoretic Guarantees For Policy Alignment In Large Language Models

Youssef Mroueh

PDF

Open Access

TL;DR

This paper provides information-theoretic bounds on policy alignment in large language models, showing how reward improvements relate to divergence measures under tail assumptions and extending results to various divergences and reward proxies.

Contribution

It establishes new upper bounds on reward improvements for policy alignment using $f$-divergences, including Rényi divergence, under tail assumptions, and connects proxy and true rewards.

Findings

01

Reward improvement scales with $\

02

Reward bounds hold under sub-gaussian tail assumptions.

03

Bounds extend to any $f$-divergence via order statistics and data processing inequality.

Abstract

Policy alignment of large language models refers to constrained policy optimization, where the policy is optimized to maximize a reward while staying close to a reference policy with respect to an $f$ -divergence such as the $KL$ divergence. The best of $n$ alignment policy selects a sample from the reference policy that has the maximum reward among $n$ independent samples. For both cases (policy alignment and best of $n$ ), recent works showed empirically that the reward improvement of the aligned policy on the reference one scales like $KL$ , with an explicit bound in $n$ on the $KL$ for the best of $n$ policy. We show in this paper that the $KL$ information theoretic upper bound holds if the reward under the reference policy has sub-gaussian tails. Moreover, we prove for the best of $n$ policy, that the $KL$ upper bound can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques