Reflection-Window Decoding: Text Generation with Selective Refinement
Zeyu Tang, Zhenhao Chen, Xiangchen Song, Loka Li, Yunlong Deng, Yifan Shen, Guangyi Chen, Peter Spirtes, Kun Zhang

TL;DR
This paper introduces Reflection-Window Decoding, a method for text generation that allows for selective refinement during decoding, aiming to improve the optimality of generated responses in large language models.
Contribution
It proposes a novel decoding framework with a sliding reflection window and pausing criterion to enhance response quality without sacrificing efficiency.
Findings
The method improves text generation quality over autoregressive decoding.
Experimental results show better response optimality with comparable efficiency.
The approach effectively balances refinement and generation during decoding.
Abstract
The autoregressive decoding for text generation in large language models (LLMs), while widely used, is inherently suboptimal due to the lack of a built-in mechanism to perform refinement and/or correction of the generated content. In this paper, we consider optimality in terms of the joint probability over the generated response, when jointly considering all tokens at the same time. We theoretically characterize the potential deviation of the autoregressively generated response from its globally optimal counterpart that is of the same length. Our analysis suggests that we need to be cautious when noticeable uncertainty arises during text generation, which may signal the sub-optimality of the generation history. To address the pitfall of autoregressive decoding for text generation, we propose an approach that incorporates a sliding reflection window and a pausing criterion, such that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
