A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors
Naaman Tan, Josef Valvoda, Tianyu Liu, Anej Svete, Yanxia Qin, Kan, Min-Yen, Ryan Cotterell

TL;DR
This paper explores the trade-off between string quality and probability in aligned language models, showing how sampling methods influence the balance between human-preferred quality and model likelihood.
Contribution
It provides a formal analysis of the probability--quality trade-off in aligned language models and introduces how sampling adaptors control this balance.
Findings
A trade-off exists between reward and likelihood in aligned models.
Sampling adaptors can tune the balance between quality and probability.
Formal framework for probability--quality relationship in human-aligned models.
Abstract
The relationship between the quality of a string, as judged by a human reader, and its probability, under a language model undergirds the development of better language models. For example, many popular algorithms for sampling from a language model have been conceived with the goal of manipulating to place higher probability on strings that humans deem of high quality. In this article, we examine the probability--quality relationship in language models explicitly aligned to human preferences, e.g., through reinforcement learning through human feedback. We show that, when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood under the prior language model, i.e., the same model before alignment with human preferences. We provide a formal treatment of this phenomenon…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
