Explaining and Improving Contrastive Decoding by Extrapolating the   Probabilities of a Huge and Hypothetical LM

Haw-Shiuan Chang; Nanyun Peng; Mohit Bansal; Anil Ramakrishna,; Tagyoung Chung

arXiv:2411.01610·cs.CL·November 5, 2024

Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM

Haw-Shiuan Chang, Nanyun Peng, Mohit Bansal, Anil Ramakrishna,, Tagyoung Chung

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper provides a theoretical understanding of contrastive decoding, revealing it as linear extrapolation from a hypothetical large LM, and introduces APD, a new method that improves factuality and performance without extra inference costs.

Contribution

It offers a theoretical analysis of contrastive decoding and proposes APD, a novel unsupervised decoding method that extrapolates LM probabilities to enhance text generation.

Findings

01

APD significantly improves factuality over CD in open-ended generation.

02

APD achieves state-of-the-art results on Pythia 6.9B and OPT 6.7B models.

03

APD often outperforms CD and matches larger LLMs in commonsense QA tasks.

Abstract

Contrastive decoding (CD) (Li et al., 2023) improves the next-token distribution of a large expert language model (LM) using a small amateur LM. Although CD is applied to various LMs and domains to enhance open-ended text generation, it is still unclear why CD often works well, when it could fail, and how we can make it better. To deepen our understanding of CD, we first theoretically prove that CD could be viewed as linearly extrapolating the next-token logits from a huge and hypothetical LM. We also highlight that the linear extrapolation could make CD unable to output the most obvious answers that have already been assigned high probabilities by the amateur LM. To overcome CD's limitation, we propose a new unsupervised decoding method called $A$ symptotic $P$ robability $D$ ecoding (APD). APD explicitly extrapolates the probability curves from the LMs of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

amazon-science/llm-asymptotic-decoding
pytorchOfficial

Videos

Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsPythia · OPT