Bag of Design Choices for Inference of High-Resolution Masked Generative   Transformer

Shitong Shao; Zikai Zhou; Tian Ye; Lichen Bai; Zhiqiang Xu; and Zeke Xie

arXiv:2411.10781·cs.CV·February 28, 2025

Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer

Shitong Shao, Zikai Zhou, Tian Ye, Lichen Bai, Zhiqiang Xu, and Zeke Xie

PDF

Open Access 1 Repo

TL;DR

This paper investigates and optimizes inference techniques for masked generative Transformers in high-resolution image generation, providing practical design choices that improve performance and sampling efficiency.

Contribution

It introduces and analyzes specific inference strategies for MGT, filling a gap in existing research, and demonstrates their effectiveness through extensive experiments.

Findings

01

Enhanced inference techniques improve sampling quality.

02

Design choices lead to ~70% winning rate on HPS v2.

03

Sampling acceleration methods increase efficiency.

Abstract

Text-to-image diffusion models (DMs) develop at an unprecedented pace, supported by thorough theoretical exploration and empirical analysis. Unfortunately, the discrepancy between DMs and autoregressive models (ARMs) complicates the path toward achieving the goal of unified vision and language generation. Recently, the masked generative Transformer (MGT) serves as a promising intermediary between DM and ARM by predicting randomly masked image tokens (i.e., masked image modeling), combining the efficiency of DM with the discrete token nature of ARM. However, we find that the comprehensive analyses regarding the inference for MGT are virtually non-existent, and thus we aim to present positive design choices to fill this gap. We propose and redesign a set of enhanced inference techniques tailored for MGT, providing a detailed analysis of their performance. Additionally, we explore several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xie-lab-ml/Meissonic-Inference
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWelding Techniques and Residual Stresses

MethodsAttention Is All You Need · Adam · Residual Connection · Byte Pair Encoding · Linear Layer · Sparse Evolutionary Training · Absolute Position Encodings · Multi-Head Attention · Dense Connections · Label Smoothing