Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu, Bocheng Li, Yifei Xin, Zhihua Xia, Linli Xu

TL;DR
This paper introduces SimVQ, a simple linear transformation approach to prevent representation collapse in vector quantized models, improving codebook utilization and scalability across various tasks.
Contribution
We propose SimVQ, a novel linear reparameterization method that addresses codebook collapse by optimizing the entire linear space instead of individual code vectors.
Findings
Improves codebook utilization in VQ models
Effective across image and audio tasks
Easy to implement and generalizes well
Abstract
Vector Quantization (VQ) is essential for discretizing continuous representations in unsupervised learning but suffers from representation collapse, causing low codebook utilization and limiting scalability. Existing solutions often rely on complex optimizations or reduce latent dimensionality, which compromises model capacity and fails to fully solve the problem. We identify the root cause as disjoint codebook optimization, where only a few code vectors are updated via gradient descent. To fix this, we propose \textbf{Sim}ple\textbf{VQ}, which reparameterizes code vectors through a learnable linear transformation layer over a latent basis, optimizing the \textit{entire linear space} rather than nearest \textit{individual code vectors}. Although the multiplication of two linear matrices is equivalent to applying a single linear layer, this simple approach effectively prevents collapse.…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. The authors identify disjoint optimization of the codebook as the key reason for representation collapse in VQ 2. The proposed approach SimVQ only requires adding one linear layer for implementation 3. Maintains effectiveness with increasing codebook size
1. the main contribution of the paper can be summed up by the fact that if assigning one codeword to the encoder output causes representation collapse, one can use a weighted combination of the whole codebook. While this may work well in practise, I dont think a whole new paper is needed on this. 2. toy example section section if included needs to be properly fleshed out where the reader can work out the example by hand 3. please add reference to Residual Quantization with Implicit Neural Codeb
The paper is easy to follow. The proposed codebook reparameterization, which ensures updates across the entire codebook, is a promising approach. The solution is straightforward yet effectively addresses codebook collapse, and the experiments on reconstruction tasks are convincing.
**W1:** Limited Literature Review: The paper lacks a comprehensive review of related work, omitting several relevant methods that tackle codebook collapse, such as SQ-VAE [1], VQ-WAE [2], HVQ-VAE [3], and CVQ-VAE [4]. I find that the insight from Section 3.2 isn’t particularly novel. Codebook collapse is often attributed to the initialization process, where latents tend to concentrate around a few codebook vectors. During training, latents can easily overfit to these vectors (referred to as the
1. The idea is simple! I really like it. 2. It solve a problem that's annoying, important but also relatively ignored by the community.
1. The method can be understood as a low rank re-parametrization. I felt like this method should have lots of connection to classic dictionary learning, sparse coding, and vector quantization. But this connection is not explored in the paper. I'm not that familiar with either the current VQVAE or dictionary learning literature. But I know this idea of "dead unit" is not new problem and people must tried something. Maybe other reviewers/AC can also help if they're familiar with these literatures.
1. A simple linear layer can be used to address the codebook collapse problem and obtain better reconstruction results. 2. Experiments were demonstrated on two modalities.
1. The idea using a linear combination of "atoms" and "coefficients" in the latent space has been explored on existing works. 1) Auto-regressive Image Generation Using Residual Quantization. 2) SC-VAE: Sparse Coding-based Variational Autoencoder with Learned ISTA. No discussion about these relevant works on the related work section. No comparisons with these methods were shown on the experiments section. 2. Though showing better reconstruction results. A very important aspect of vq-v
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Gene Regulatory Network Analysis · Scientific Computing and Data Management
