Exploiting Discriminative Codebook Prior for Autoregressive Image Generation

Longxiang Tang; Ruihang Chu; Xiang Wang; Yujin Han; Pingyu Wu; Chunming He; Yingya Zhang; Shiwei Zhang; Jiaya Jia

arXiv:2508.10719·cs.CV·August 15, 2025

Exploiting Discriminative Codebook Prior for Autoregressive Image Generation

Longxiang Tang, Ruihang Chu, Xiang Wang, Yujin Han, Pingyu Wu, Chunming He, Yingya Zhang, Shiwei Zhang, Jiaya Jia

PDF

TL;DR

This paper introduces DCPE, a novel method for extracting discriminative priors from codebooks in autoregressive image generation, outperforming k-means clustering by improving training efficiency and image quality.

Contribution

The paper proposes DCPE, an alternative to k-means, which effectively mines token similarity information in codebooks using instance-based distances and agglomerative merging.

Findings

01

DCPE accelerates training by 42% on LlamaGen-B.

02

DCPE improves FID and IS scores in image generation.

03

DCPE seamlessly integrates with existing models.

Abstract

Advanced discrete token-based autoregressive image generation systems first tokenize images into sequences of token indices with a codebook, and then model these sequences in an autoregressive paradigm. While autoregressive generative models are trained only on index values, the prior encoded in the codebook, which contains rich token similarity information, is not exploited. Recent studies have attempted to incorporate this prior by performing naive k-means clustering on the tokens, helping to facilitate the training of generative models with a reduced codebook. However, we reveal that k-means clustering performs poorly in the codebook feature space due to inherent issues, including token space disparity and centroid distance inaccuracy. In this work, we propose the Discriminative Codebook Prior Extractor (DCPE) as an alternative to k-means clustering for more effectively mining and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.