Early Quantization Shrinks Codebook: A Simple Fix for Diversity-Preserving Tokenization
Wenhao Zhao, Qiran Zou, Rushi Shah, Yudi Wu, Zhouhan Lin, Dianbo Liu

TL;DR
This paper investigates the collapse issues in vector quantization used in generative models, identifying causes like random initialization and limited encoder capacity, and proposes solutions to mitigate these collapses.
Contribution
It provides the first comprehensive analysis of representation collapses in vector quantization and offers potential fixes for these issues.
Findings
Identified severity and conditions of codebook and embedding collapses
Random initialization and limited encoder capacity cause collapses
Proposed solutions mitigate collapse problems
Abstract
Vector quantization is a technique in machine learning that discretizes continuous representations into a set of discrete vectors. It is widely employed in tokenizing data representations for large language models, diffusion models, and other generative models. Despite its prevalence, the characteristics and behaviors of vector quantization in generative models remain largely underexplored. In this study, we systematically investigate the issue of collapses in vector quantization, where collapsed representations are observed across discrete codebook tokens and continuous latent embeddings. By leveraging both synthetic and real datasets, we identify the severity of each type of collapses and triggering conditions. Our analysis reveals that random initialization and limited encoder capacity result in tokens collapse and embeddings collapse. Building on these findings, we propose potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Generative Adversarial Networks and Image Synthesis · Natural Language Processing Techniques
