Improved Masked Image Generation with Token-Critic
Jos\'e Lezama, Huiwen Chang, Lu Jiang, Irfan Essa

TL;DR
This paper introduces Token-Critic, an auxiliary model that guides non-autoregressive image generation, significantly enhancing quality and diversity in class-conditional ImageNet synthesis compared to diffusion models and GANs.
Contribution
The paper proposes Token-Critic, a novel auxiliary model that improves non-autoregressive image sampling by guiding token acceptance and rejection, leading to superior image quality and diversity.
Findings
Token-Critic improves sampling efficiency and quality.
The method outperforms recent diffusion models and GANs.
Enhanced trade-off between image quality and diversity.
Abstract
Non-autoregressive generative transformers recently demonstrated impressive image generation performance, and orders of magnitude faster sampling than their autoregressive counterparts. However, optimal parallel sampling from the true joint distribution of visual tokens remains an open challenge. In this paper we introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer. During non-autoregressive iterative sampling, Token-Critic is used to select which tokens to accept and which to reject and resample. Coupled with Token-Critic, a state-of-the-art generative transformer significantly improves its performance, and outperforms recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
MethodsDiffusion
