BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Shaozhe Hao, Xuantong Liu, Xianbiao Qi, Shihao Zhao, Bojia Zi, Rong, Xiao, Kai Han, Kwan-Yee K. Wong

TL;DR
BiGR introduces a unified binary latent code-based model for high-quality image generation and versatile visual representations, achieving superior performance and zero-shot generalization across multiple vision tasks.
Contribution
It is the first to unify generation and discrimination within a single framework using binary latent codes and introduces novel mechanisms like entropy-ordered sampling.
Findings
BiGR achieves state-of-the-art FID-50k scores.
It demonstrates high linear-probe accuracy for representations.
Enables zero-shot tasks like inpainting, outpainting, and editing.
Abstract
We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for binary code prediction. Additionally, we introduce a novel entropy-ordered sampling method to enable efficient image generation. Extensive experiments validate BiGR's superior performance in generation quality, as measured by FID-50k, and representation capabilities, as evidenced by linear-probe accuracy. Moreover, BiGR showcases zero-shot generalization across various vision tasks, enabling applications such as image inpainting, outpainting, editing, interpolation, and enrichment, without the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
