Scaling Image Tokenizers with Grouped Spherical Quantization
Jiangtao Wang, Zhen Qin, Yifan Zhang, Vincent Tao Hu, Bj\"orn Ommer,, Rania Briq, Stefan Kesselheim

TL;DR
This paper introduces Grouped Spherical Quantization (GSQ), a novel method for scalable image tokenization that improves reconstruction quality and efficiency, enabling effective high-dimensional latent space representation and scaling.
Contribution
The paper proposes GSQ with spherical codebook initialization and lookup regularization, providing a new approach for scalable and high-quality image tokenization.
Findings
GSQ-GAN outperforms state-of-the-art methods in reconstruction quality.
GSQ enables efficient high-dimensional latent space representation.
Achieved 16x down-sampling with a reconstruction FID of 0.50.
Abstract
Vision tokenizers have gained a lot of attraction due to their scalability and compactness; previous works depend on old-school GAN-based hyperparameters, biased comparisons, and a lack of comprehensive analysis of the scaling behaviours. To tackle those issues, we introduce Grouped Spherical Quantization (GSQ), featuring spherical codebook initialization and lookup regularization to constrain codebook latent to a spherical surface. Our empirical analysis of image tokenizer training strategies demonstrates that GSQ-GAN achieves superior reconstruction quality over state-of-the-art methods with fewer training iterations, providing a solid foundation for scaling studies. Building on this, we systematically examine the scaling behaviours of GSQ, specifically in latent dimensionality, codebook size, and compression ratios, and their impact on model performance. Our findings reveal distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗HelmholtzAI-FZJ/GSQ-F8-D8-V64kmodel· 2 dl· ♡ 12 dl♡ 1
- 🤗HelmholtzAI-FZJ/GSQ-F8-D32-G2-V256kmodel
- 🤗HelmholtzAI-FZJ/GSQ-F8-D64-G16-V8kmodel· 7 dl7 dl
- 🤗HelmholtzAI-FZJ/GSQ-F8-D64-G2-V8kmodel· 2 dl2 dl
- 🤗HelmholtzAI-FZJ/GSQ-F8-D64-G4-V256kmodel
- 🤗HelmholtzAI-FZJ/GSQ-F8-D64-V256kmodel
- 🤗HelmholtzAI-FZJ/GSQ-F8-D16-V16kmodel
- 🤗HelmholtzAI-FZJ/GSQ-F8-D16-V256kmodel
- 🤗HelmholtzAI-FZJ/GSQ-F8-D16-V512kmodel· 3 dl3 dl
- 🤗HelmholtzAI-FZJ/GSQ-F8-D16-V64kmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques
