Reverse-Engineering Decoding Strategies Given Blackbox Access to a Language Generation System
Daphne Ippolito, Nicholas Carlini, Katherine Lee, Milad Nasr, Yun, William Yu

TL;DR
This paper develops methods to identify the decoding strategy used in blackbox language models, which helps in detecting generated text and understanding biases introduced by decoding choices.
Contribution
It introduces techniques to reverse-engineer decoding strategies like top-k and nucleus sampling from blackbox APIs, including proprietary systems like ChatGPT.
Findings
Decoding strategies can be accurately identified in various models.
Reveals biases caused by decoding truncation.
Applicable to both open-source and commercial models.
Abstract
Neural language models are increasingly deployed into APIs and websites that allow a user to pass in a prompt and receive generated text. Many of these systems do not reveal generation parameters. In this paper, we present methods to reverse-engineer the decoding method used to generate text (i.e., top- or nucleus sampling). Our ability to discover which decoding strategy was used has implications for detecting generated text. Additionally, the process of discovering the decoding strategy can reveal biases caused by selecting decoding settings which severely truncate a model's predicted distributions. We perform our attack on several families of open-source language models, as well as on production systems (e.g., ChatGPT).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
