GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation
Xuan Zhao, Zhongyu Zhang, Yuge Huang, Yuxi Mi, Guodong Mu, Shouhong Ding, Jun Wang, Rizen Guo, Shuigeng Zhou

TL;DR
GloTok introduces a global relational approach to image tokenization, promoting uniform semantic distribution for improved image reconstruction and generation, outperforming existing methods on ImageNet-1k.
Contribution
The paper proposes GloTok, a novel global perspective tokenizer that models semantic features with a more uniform distribution using codebook-wise histogram relation learning.
Findings
Achieves state-of-the-art reconstruction performance on ImageNet-1k.
Produces higher quality image generation compared to previous methods.
Facilitates autoregressive model training without pre-trained models during training.
Abstract
Existing state-of-the-art image tokenization methods leverage diverse semantic features from pre-trained vision models for additional supervision, to expand the distribution of latent representations and thereby improve the quality of image reconstruction and generation. These methods employ a locally supervised approach for semantic supervision, which limits the uniformity of semantic distribution. However, VA-VAE proves that a more uniform feature distribution yields better generation performance. In this work, we introduce a Global Perspective Tokenizer (GloTok), which utilizes global relational information to model a more uniform semantic distribution of tokenized features. Specifically, a codebook-wise histogram relation learning method is proposed to transfer the semantics, which are modeled by pre-trained models on the entire dataset, to the semantic codebook. Then, we design a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques · Image Enhancement Techniques
