Accelerating Auto-regressive Text-to-Image Generation with Training-free   Speculative Jacobi Decoding

Yao Teng; Han Shi; Xian Liu; Xuefei Ning; Guohao Dai; Yu Wang; Zhenguo; Li; Xihui Liu

arXiv:2410.01699·cs.CV·March 5, 2025

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

Yao Teng, Han Shi, Xian Liu, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo, Li, Xihui Liu

PDF

Open Access 2 Repos

TL;DR

This paper introduces a training-free probabilistic decoding method called Speculative Jacobi Decoding (SJD) that accelerates auto-regressive text-to-image generation while preserving diversity and quality.

Contribution

The paper proposes SJD, a novel probabilistic parallel decoding algorithm that speeds up auto-regressive image generation without training and maintains sampling diversity.

Findings

01

SJD significantly reduces inference steps in text-to-image models.

02

SJD maintains high visual quality comparable to traditional methods.

03

Token initialization strategies further enhance acceleration in specific scenarios.

Abstract

The current large auto-regressive models can generate high-quality, high-resolution images, but these models require hundreds or even thousands of steps of next-token prediction during inference, resulting in substantial time consumption. In existing studies, Jacobi decoding, an iterative parallel decoding algorithm, has been used to accelerate the auto-regressive generation and can be executed without training. However, the Jacobi decoding relies on a deterministic criterion to determine the convergence of iterations. Thus, it works for greedy decoding but is incompatible with sampling-based decoding which is crucial for visual quality and diversity in the current auto-regressive text-to-image generation. In this paper, we propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD), to accelerate auto-regressive text-to-image generation. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization