Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu, Fangyun Wei, Yanye Lu, Dong Chen

TL;DR
This paper introduces VQGAN-LC, a novel image quantization model that significantly enlarges the codebook size to 100,000 with over 99% utilization, improving performance across multiple image tasks.
Contribution
The paper presents a new approach to scale the VQGAN codebook to 100,000 entries with high utilization, using a pre-trained encoder and a projector for alignment, surpassing previous limitations.
Findings
Achieved a codebook size of 100,000 with over 99% utilization.
Demonstrated improved performance in image reconstruction and generation tasks.
Outperformed previous VQGAN variants in various image-related benchmarks.
Abstract
In the realm of image quantization exemplified by VQGAN, the process encodes images into discrete tokens drawn from a codebook with a predefined size. Recent advancements, particularly with LLAMA 3, reveal that enlarging the codebook significantly enhances model performance. However, VQGAN and its derivatives, such as VQGAN-FC (Factorized Codes) and VQGAN-EMA, continue to grapple with challenges related to expanding the codebook size and enhancing codebook utilization. For instance, VQGAN-FC is restricted to learning a codebook with a maximum size of 16,384, maintaining a typically low utilization rate of less than 12% on ImageNet. In this work, we propose a novel image quantization model named VQGAN-LC (Large Codebook), which extends the codebook size to 100,000, achieving an utilization rate exceeding 99%. Unlike previous methods that optimize each codebook entry, our approach begins…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Byte Pair Encoding · Attention Dropout · Weight Decay · Dropout · Adam · Linear Warmup With Cosine Annealing · Linear Layer
