A Probability--Quality Trade-off in Aligned Language Models and its   Relation to Sampling Adaptors

Naaman Tan; Josef Valvoda; Tianyu Liu; Anej Svete; Yanxia Qin; Kan; Min-Yen; Ryan Cotterell

arXiv:2406.10203·cs.CL·October 29, 2024

A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

Naaman Tan, Josef Valvoda, Tianyu Liu, Anej Svete, Yanxia Qin, Kan, Min-Yen, Ryan Cotterell

PDF

Open Access

TL;DR

This paper explores the trade-off between string quality and probability in aligned language models, showing how sampling methods influence the balance between human-preferred quality and model likelihood.

Contribution

It provides a formal analysis of the probability--quality trade-off in aligned language models and introduces how sampling adaptors control this balance.

Findings

01

A trade-off exists between reward and likelihood in aligned models.

02

Sampling adaptors can tune the balance between quality and probability.

03

Formal framework for probability--quality relationship in human-aligned models.

Abstract

The relationship between the quality of a string, as judged by a human reader, and its probability, $p (y)$ under a language model undergirds the development of better language models. For example, many popular algorithms for sampling from a language model have been conceived with the goal of manipulating $p (y)$ to place higher probability on strings that humans deem of high quality. In this article, we examine the probability--quality relationship in language models explicitly aligned to human preferences, e.g., through reinforcement learning through human feedback. We show that, when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood under the prior language model, i.e., the same model before alignment with human preferences. We provide a formal treatment of this phenomenon…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques