Coupling without Communication and Drafter-Invariant Speculative Decoding
Majid Daliri, Christopher Musco, Ananda Theertha Suresh

TL;DR
This paper explores communication-free coupling of distributions, introduces Gumbel sampling as an improvement over Weighted MinHash, and applies these methods to speculative decoding in language models, achieving better success probabilities.
Contribution
The paper provides a simpler proof of optimality for communication-free coupling and introduces Gumbel sampling as a Pareto improvement over Weighted MinHash, with practical applications in language model decoding.
Findings
Gumbel sampling achieves higher success probability than Weighted MinHash.
Communication-free protocols can be used for fixed-output speculative decoding.
Gumbel sampling outperforms Weighted MinHash in language generation experiments.
Abstract
Suppose Alice has a distribution and Bob has a distribution . Alice wants to draw a sample and Bob a sample such that with as high of probability as possible. It is well-known that, by sampling from an optimal coupling between the distributions, Alice and Bob can achieve , where is the total variation distance between and . What if Alice and Bob must solve this same problem \emph{without communicating at all?} Perhaps surprisingly, with access to public randomness, they can still achieve using a simple protocol based on the Weighted MinHash algorithm. This bound was shown to be optimal in the worst-case by [Bavarian et al., 2020]. In this work, we revisit the communication-free coupling problem. We provide a simpler proof of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Database Systems and Queries
