MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Mingkai Jia; Wei Yin; Xiaotao Hu; Jiaxin Guo; Xiaoyang Guo; Qian Zhang; Xiao-Xiao Long; Ping Tan

arXiv:2507.07997·cs.CV·July 15, 2025

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Mingkai Jia, Wei Yin, Xiaotao Hu, Jiaxin Guo, Xiaoyang Guo, Qian Zhang, Xiao-Xiao Long, Ping Tan

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper introduces MGVQ, a novel vector quantization method that enhances the reconstruction quality of VQ-VAEs, outperforming existing models on multiple benchmarks and enabling better high-resolution image processing.

Contribution

MGVQ augments discrete codebooks with multi-group quantization, improving optimization and information retention, leading to state-of-the-art reconstruction performance in VQ-VAEs.

Findings

01

Outperforms existing VQ-VAEs on ImageNet with lower rFID scores.

02

Achieves superior PSNR on all zero-shot benchmarks.

03

Enhances reconstruction quality for high-resolution images.

Abstract

Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental models that compress continuous visual data into discrete tokens. Existing methods have tried to improve the quantization strategy for better reconstruction quality, however, there still exists a large gap between VQ-VAEs and VAEs. To narrow this gap, we propose MGVQ, a novel method to augment the representation capability of discrete codebooks, facilitating easier optimization for codebooks and minimizing information loss, thereby enhancing reconstruction quality. Specifically, we propose to retain the latent dimension to preserve encoded features and incorporate a set of sub-codebooks for quantization. Furthermore, we construct comprehensive zero-shot benchmarks featuring resolutions of 512p and 2k to evaluate the reconstruction performance of existing methods rigorously. MGVQ achieves the state-of-the-art performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MKJia/MGVQ
noneOfficial

Models

🤗
mkjia/MGVQ
model· ♡ 2
♡ 2

Datasets

mkjia/UHDBench
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Advanced Neural Network Applications