2D Gaussians Meet Visual Tokenizer

Yiang Shi; Xiaoyang Guo; Wei Yin; Mingkai Jia; Qian Zhang; Xiaolin Hu; Wenyu Liu; Xinggang Wang

arXiv:2508.13515·cs.CV·August 21, 2025

2D Gaussians Meet Visual Tokenizer

Yiang Shi, Xiaoyang Guo, Wei Yin, Mingkai Jia, Qian Zhang, Xiaolin Hu, Wenyu Liu, Xinggang Wang

PDF

Open Access

TL;DR

This paper introduces Visual Gaussian Quantization (VGQ), a novel image tokenizer that models geometric structures using 2D Gaussians, significantly improving image reconstruction quality over existing patch-based methods.

Contribution

VGQ explicitly incorporates 2D Gaussian distributions into visual tokenization, enhancing structural modeling and reconstruction fidelity beyond traditional quantization methods.

Findings

01

VGQ achieves an rFID score of 1.00 on ImageNet 256x256.

02

VGQ outperforms existing methods with an rFID of 0.556.

03

Increasing Gaussian density improves reconstruction quality.

Abstract

The image tokenizer is a critical component in AR image generation, as it determines how rich and structured visual content is encoded into compact representations. Existing quantization-based tokenizers such as VQ-GAN primarily focus on appearance features like texture and color, often neglecting geometric structures due to their patch-based design. In this work, we explored how to incorporate more visual information into the tokenizer and proposed a new framework named Visual Gaussian Quantization (VGQ), a novel tokenizer paradigm that explicitly enhances structural modeling by integrating 2D Gaussians into traditional visual codebook quantization frameworks. Our approach addresses the inherent limitations of naive quantization methods such as VQ-GAN, which struggle to model structured visual information due to their patch-based design and emphasis on texture and color. In contrast,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques