Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of   99%

Lei Zhu; Fangyun Wei; Yanye Lu; Dong Chen

arXiv:2406.11837·cs.CV·June 18, 2024·1 cites

Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%

Lei Zhu, Fangyun Wei, Yanye Lu, Dong Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces VQGAN-LC, a novel image quantization model that significantly enlarges the codebook size to 100,000 with over 99% utilization, improving performance across multiple image tasks.

Contribution

The paper presents a new approach to scale the VQGAN codebook to 100,000 entries with high utilization, using a pre-trained encoder and a projector for alignment, surpassing previous limitations.

Findings

01

Achieved a codebook size of 100,000 with over 99% utilization.

02

Demonstrated improved performance in image reconstruction and generation tasks.

03

Outperformed previous VQGAN variants in various image-related benchmarks.

Abstract

In the realm of image quantization exemplified by VQGAN, the process encodes images into discrete tokens drawn from a codebook with a predefined size. Recent advancements, particularly with LLAMA 3, reveal that enlarging the codebook significantly enhances model performance. However, VQGAN and its derivatives, such as VQGAN-FC (Factorized Codes) and VQGAN-EMA, continue to grapple with challenges related to expanding the codebook size and enhancing codebook utilization. For instance, VQGAN-FC is restricted to learning a codebook with a maximum size of 16,384, maintaining a typically low utilization rate of less than 12% on ImageNet. In this work, we propose a novel image quantization model named VQGAN-LC (Large Codebook), which extends the codebook size to 100,000, achieving an utilization rate exceeding 99%. Unlike previous methods that optimize each codebook entry, our approach begins…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zh460045050/vqgan-lc
pytorchOfficial

Videos

Scaling the Codebook Size of VQ-GAN to 100,000 with a Utilization Rate of 99%· slideslive

Taxonomy

TopicsHandwritten Text Recognition Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Byte Pair Encoding · Attention Dropout · Weight Decay · Dropout · Adam · Linear Warmup With Cosine Annealing · Linear Layer