Improved Masked Image Generation with Token-Critic

Jos\'e Lezama; Huiwen Chang; Lu Jiang; Irfan Essa

arXiv:2209.04439·cs.CV·September 12, 2022

Improved Masked Image Generation with Token-Critic

Jos\'e Lezama, Huiwen Chang, Lu Jiang, Irfan Essa

PDF

Open Access 1 Repo

TL;DR

This paper introduces Token-Critic, an auxiliary model that guides non-autoregressive image generation, significantly enhancing quality and diversity in class-conditional ImageNet synthesis compared to diffusion models and GANs.

Contribution

The paper proposes Token-Critic, a novel auxiliary model that improves non-autoregressive image sampling by guiding token acceptance and rejection, leading to superior image quality and diversity.

Findings

01

Token-Critic improves sampling efficiency and quality.

02

The method outperforms recent diffusion models and GANs.

03

Enhanced trade-off between image quality and diversity.

Abstract

Non-autoregressive generative transformers recently demonstrated impressive image generation performance, and orders of magnitude faster sampling than their autoregressive counterparts. However, optimal parallel sampling from the true joint distribution of visual tokens remains an open challenge. In this paper we introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer. During non-autoregressive iterative sampling, Token-Critic is used to select which tokens to accept and which to reject and resample. Coupled with Token-Critic, a state-of-the-art generative transformer significantly improves its performance, and outperforms recent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucidrains/phenaki-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging

MethodsDiffusion