Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
Teng Hu, Jiangning Zhang, Ran Yi, Jieyu Weng, Yabiao Wang, Xianfang, Zeng, Zhucun Xue, Lizhuang Ma

TL;DR
This paper introduces IAR, a novel method for improving autoregressive visual generation with large language models by using cluster-oriented token prediction, which enhances training efficiency and robustness.
Contribution
It proposes a cluster-oriented cross-entropy loss and codebook rearrangement strategy to improve visual token prediction in LLM-based visual generation models.
Findings
Reduces training time by half for models from 100M to 1.4B parameters.
Achieves comparable FID scores with improved robustness.
Applicable to various LLM-based visual generation models.
Abstract
Employing LLMs for visual generation has recently become a research focus. However, the existing methods primarily transfer the LLM architecture to visual generation but rarely investigate the fundamental differences between language and vision. This oversight may lead to suboptimal utilization of visual generation capabilities within the LLM framework. In this paper, we explore the characteristics of visual embedding space under the LLM framework and discover that the correlation between visual embeddings can help achieve more stable and robust generation results. We present IAR, an Improved AutoRegressive Visual Generation Method that enhances the training efficiency and generation quality of LLM-based visual generation models. Firstly, we propose a Codebook Rearrangement strategy that uses balanced k-means clustering algorithm to rearrange the visual codebook into clusters, ensuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Video Analysis and Summarization · Image Retrieval and Classification Techniques
Methodsk-Means Clustering
