SJD-PV: Speculative Jacobi Decoding with Phrase Verification for Autoregressive Image Generation
Zhehao Yu, Baoquan Zhang, Bingqi Shan, Xinhao Liu, Dongliang Zhou, Guotao Liang, Guangming Ye, Yunming Ye

TL;DR
This paper introduces SJD-PV, a training-free method that accelerates autoregressive image generation by verifying groups of co-occurring tokens as phrases, reducing inference time while maintaining quality.
Contribution
The paper proposes a novel phrase-level speculative verification framework that leverages token co-occurrence statistics for faster autoregressive image decoding without retraining.
Findings
Achieves up to 30% faster decoding speed.
Reduces the number of function evaluations significantly.
Maintains high visual fidelity during accelerated generation.
Abstract
Autoregressive (AR) image models have recently demonstrated remarkable generative capability, but their sequential nature results in significant inference latency. Existing training-free acceleration methods typically verify tokens independently, overlooking the strong co-occurrence patterns between adjacent visual tokens. This independence assumption often leads to contextual inconsistency and limits decoding efficiency. In this work, we introduce a novel training-free acceleration framework that performs phrase-level speculative verification, enabling the model to jointly validate multiple correlated tokens within each decoding window. To construct such phrase units, we analyze token co-occurrence statistics from the training corpus and group frequently co-occurring tokens into semantically coherent visual phrases. During inference, the proposed phrase-level verification evaluates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Face recognition and analysis
