GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting
Jiajun Dong, Chengkun Wang, Wenzhao Zheng, Lei Chen, Jiwen Lu, Yansong, Tang

TL;DR
GaussianToken introduces a novel image tokenizer that uses 2D Gaussian splatting to enhance representational capacity, improving image reconstruction quality for multi-modal tasks.
Contribution
The paper proposes GaussianToken, which models encoded samples as 2D Gaussians, integrating local influence into discrete space to improve image tokenization.
Findings
Achieves competitive reconstruction on CIFAR, Mini-ImageNet, ImageNet-1K
Enhances representation ability over traditional VQ-based tokenizers
Demonstrates effectiveness in multi-modal understanding and generation tasks
Abstract
Effective image tokenization is crucial for both multi-modal understanding and generation tasks due to the necessity of the alignment with discrete text data. To this end, existing approaches utilize vector quantization (VQ) to project pixels onto a discrete codebook and reconstruct images from the discrete representation. However, compared with the continuous latent space, the limited discrete codebook space significantly restrict the representational ability of these image tokenizers. In this paper, we propose GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting as a solution. We first represent the encoded samples as multiple flexible featured 2D Gaussians characterized by positions, rotation angles, scaling factors, and feature coefficients. We adopt the standard quantization for the Gaussian features and then concatenate the quantization results with the other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Advanced Steganography and Watermarking Techniques
MethodsADaptive gradient method with the OPTimal convergence rate
