Approximately Aligned Decoding
Daniel Melcer, Sujan Gonugondla, Pramuditha Perera, Haifeng Qian, Wen-Hao Chiang, Yanjun Wang, Nihal Jain, Pranav Garg, Xiaofei Ma, Anoop Deoras

TL;DR
Approximately Aligned Decoding (AprAD) offers a computationally efficient method for generating long, constrained sequences from Large Language Models with minimal distribution distortion, outperforming existing approaches.
Contribution
AprAD introduces a novel decoding algorithm inspired by speculative decoding, balancing output quality and efficiency for constrained text generation.
Findings
AprAD maintains task-specific performance comparable to less efficient methods.
AprAD significantly reduces computational costs in constrained decoding.
AprAD effectively amplifies low probability outputs with minimal distribution distortion.
Abstract
It is common to reject undesired outputs of Large Language Models (LLMs); however, current methods to do so require an excessive amount of computation to re-sample after a rejection, or distort the distribution of outputs by constraining the output to highly improbable tokens. We present a method, Approximately Aligned Decoding (AprAD), to balance the distortion of the output distribution with computational efficiency, inspired by algorithms from the speculative decoding literature. AprAD allows for the generation of long sequences of text with difficult-to-satisfy constraints, while amplifying low probability outputs much less compared to existing methods. We show through a series of experiments that the task-specific performance of AprAD is comparable to methods that do not distort the output distribution, while being much more computationally efficient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Coding theory and cryptography
