Masked Vector Quantization
David D. Nguyen, David Leibowitz, Surya Nepal, Salil S. Kanhere

TL;DR
The paper introduces Masked Vector Quantization (MVQ), a novel framework that enhances discrete latent representations in generative models, significantly improving efficiency and quality on ImageNet with fewer tokens and codebook entries.
Contribution
MVQ increases code vector capacity through mask learning with MH-Dropout, reducing sampling time and improving generative quality in vector quantization architectures.
Findings
Up to 68% FID reduction on ImageNet 64x64
7-45x faster token sampling during inference
Smaller latent spaces enable transferable visual representations
Abstract
Generative models with discrete latent representations have recently demonstrated an impressive ability to learn complex high-dimensional data distributions. However, their performance relies on a long sequence of tokens per instance and a large number of codebook entries, resulting in long sampling times and considerable computation to fit the categorical posterior. To address these issues, we propose the Masked Vector Quantization (MVQ) framework which increases the representational capacity of each code vector by learning mask configurations via a stochastic winner-takes-all training regime called Multiple Hypothese Dropout (MH-Dropout). On ImageNet 6464, MVQ reduces FID in existing vector quantization architectures by up to at 2 tokens per instance and at 5 tokens. These improvements widen as codebook entries is reduced and allows for …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques
MethodsDropout
