Vector Quantization using Gaussian Variational Autoencoder
Tongda Xu, Wendi Zheng, Jiajun He, Jose Miguel Hernandez-Lobato, Yan Wang, Ya-Qin Zhang, Jie Tang

TL;DR
This paper introduces Gaussian Quant (GQ), a novel method for converting Gaussian VAEs into VQ-VAEs without extra training, improving discretization and performance in image compression tasks.
Contribution
The paper presents GQ, a simple technique to transform Gaussian VAEs into VQ-VAEs, along with a heuristic TDC for better training, outperforming existing methods.
Findings
GQ surpasses previous VQ-VAEs like VQGAN and FSQ in performance.
TDC enhances Gaussian VAE discretization methods such as TokenBridge.
Theoretical guarantee of small quantization error when codebook size exceeds certain rate.
Abstract
Vector-quantized variational autoencoders (VQ-VAEs) are discrete autoencoders that compress images into discrete tokens. However, they are difficult to train due to discretization. In this paper, we propose a simple yet effective technique dubbed Gaussian Quant (GQ), which first trains a Gaussian VAE under certain constraints and then converts it into a VQ-VAE without additional training. For conversion, GQ generates random Gaussian noise as a codebook and finds the closest noise vector to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAEs for effective conversion, named the target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAEs, such as VQGAN, FSQ, LFQ, and…
Peer Reviews
Decision·Submitted to ICLR 2026
- The idea is interesting and theoretically grounded. - Practical method TDC is a lightweight tweak that improves the downstream discretization quality. - Empirical results and ablations are reported across architectures.
- The paper discusses “grouping to multiple dimensions” for very low‑bitrate regimes, but it’s unclear whether grouping is evaluated as a separate ablation in the main experiments. Clarifying this would help. - As GQ quantizes each dimension of the posterior mean to the closest codeword, this results in much longer discrete token sequences compared to VQ-VAE variants that quantize sub-vectors, making the generation step more challenging and computationally expensive.
- While the idea of converting a Gaussian VAE trained in continuous space into a VQ-VAE is simple, the paper clearly motivates the story by first showing that direct conversion from a vanilla Gaussian VAE does not perform well, and then demonstrating that TDC enables training of Gaussian VAEs suitable for quantization. - The paper establishes a theoretical relationship between codebook size and bits-back coding rate. By proving theorems showing that quantization error decreases doubly exponenti
- The core claim of the method is that "a pre-trained Gaussian VAE can be converted to VQ without additional training," but in practice, applying GQ to a Gaussian VAE trained without TDC constraint results in significant performance degradation (PSNR: 26.43 dB vs 32.11 dB in Table 6). In other words, "GQ is training-free" only applies to the conversion step itself, and there is a prerequisite of "training a Gaussian VAE with TDC" in the preceding stage. While the paper mentions conversion from a
1. The paper is well structured and is easy to follow. 2. The idea of discussing the relationship between quantization error and bits-back encoding is interesting. 3. The effectiveness of the proposed method is consistently demonstrated in the experiments.
1. The method for determining the $\lambda$ parameters in TDC seems questionable. According to Eq. (6) and (7), when there is no dimension to which $\lambda_\mathrm{min}$ and $\lambda_\mathrm{max}$ are applied, $\lambda_\mathrm{min}$ and $\lambda_\mathrm{max}$ are multiplied by 0.99 at each step. Is there a risk that these parameters become extremely small when needed again? 2. Further explanation is needed for why Eq. (8) encourages greater codebook usage and entropy, and why Eq. (8) takes its
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
