VP-VAE: Rethinking Vector Quantization via Adaptive Vector Perturbation
Linwei Zhai, Han Ding, Mingzhi Lin, Cui Zhao, Fei Wang, Ge Wang, Wang Zhi, Wei Xi

TL;DR
VP-VAE introduces a novel approach to vector quantization in generative models by replacing explicit codebooks with adaptive latent perturbations, leading to more stable training and better performance.
Contribution
The paper proposes VP-VAE, a new paradigm that decouples representation learning from discretization, eliminating the need for a codebook and improving training stability.
Findings
Improved reconstruction fidelity on image and audio benchmarks.
More balanced token usage compared to traditional VQ-VAEs.
Enhanced robustness to inference-time quantization errors.
Abstract
Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental to modern generative modeling, yet they often suffer from training instability and "codebook collapse" due to the inherent coupling of representation learning and discrete codebook optimization. In this paper, we propose VP-VAE (Vector Perturbation VAE), a novel paradigm that decouples representation learning from discretization by eliminating the need for an explicit codebook during training. Our key insight is that, from the neural network's viewpoint, performing quantization primarily manifests as injecting a structured perturbation in latent space. Accordingly, VP-VAE replaces the non-differentiable quantizer with distribution-consistent and scale-adaptive latent perturbations generated via Metropolis--Hastings sampling. This design enables stable training without a codebook while making the model robust to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
