Grouped Speculative Decoding for Autoregressive Image Generation

Junhyuk So; Juncheol Shin; Hyunho Kook; Eunhyeok Park

arXiv:2508.07747·cs.CV·August 12, 2025

Grouped Speculative Decoding for Autoregressive Image Generation

Junhyuk So, Juncheol Shin, Hyunho Kook, Eunhyeok Park

PDF

TL;DR

Grouped Speculative Decoding (GSD) significantly speeds up autoregressive image models by evaluating clusters of tokens instead of single tokens, achieving 3.7x acceleration without extra training or quality loss.

Contribution

We introduce GSD, a training-free decoding acceleration method that leverages token clustering to improve inference speed of AR image models.

Findings

01

GSD achieves an average of 3.7x acceleration.

02

GSD maintains image quality comparable to baseline models.

03

Dynamic clustering outperforms static methods in token evaluation.

Abstract

Recently, autoregressive (AR) image models have demonstrated remarkable generative capabilities, positioning themselves as a compelling alternative to diffusion models. However, their sequential nature leads to long inference times, limiting their practical scalability. In this work, we introduce Grouped Speculative Decoding (GSD), a novel, training-free acceleration method for AR image models. While recent studies have explored Speculative Decoding (SD) as a means to speed up AR image generation, existing approaches either provide only modest acceleration or require additional training. Our in-depth analysis reveals a fundamental difference between language and image tokens: image tokens exhibit inherent redundancy and diversity, meaning multiple tokens can convey valid semantics. However, traditional SD methods are designed to accept only a single most-likely token, which fails to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.