Unlocking the Capabilities of Masked Generative Models for Image   Synthesis via Self-Guidance

Jiwan Hur; Dong-Jae Lee; Gyojin Han; Jaehyun Choi; Yunho Jeon; Junmo; Kim

arXiv:2410.13136·cs.CV·October 18, 2024

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance

Jiwan Hur, Dong-Jae Lee, Gyojin Han, Jaehyun Choi, Yunho Jeon, Junmo, Kim

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces a self-guidance sampling method for masked generative models (MGMs) that improves image quality and diversity, outperforming existing methods with efficient training and sampling.

Contribution

It extends guidance methods to MGMs and proposes a self-guidance approach using semantic smoothing, enhancing image synthesis performance.

Findings

01

Self-guidance improves image quality and diversity in MGMs.

02

Proposed method outperforms existing sampling techniques.

03

Achieves better quality-diversity trade-off with efficient training.

Abstract

Masked generative models (MGMs) have shown impressive generative ability while providing an order of magnitude efficient sampling steps compared to continuous diffusion models. However, MGMs still underperform in image synthesis compared to recent well-developed continuous diffusion models with similar size in terms of quality and diversity of generated samples. A key factor in the performance of continuous diffusion models stems from the guidance methods, which enhance the sample quality at the expense of diversity. In this paper, we extend these guidance methods to generalized guidance formulation for MGMs and propose a self-guidance sampling method, which leads to better generation quality. The proposed approach leverages an auxiliary task for semantic smoothing in vector-quantized token space, analogous to the Gaussian blur in continuous pixel space. Equipped with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiwanhur/unlockmgm
jaxOfficial

Models

🤗
HURJIWAN/UnlockMGM
model

Videos

Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance· slideslive

Taxonomy

TopicsRobotics and Automated Systems · Robotics and Sensor-Based Localization · Augmented Reality Applications

MethodsDiffusion