[MASK] is All You Need
Vincent Tao Hu, Bj\"orn Ommer

TL;DR
This paper introduces Discrete Interpolants, a unified framework that connects Masked Generative Models and Diffusion Models using discrete-state models, enabling scalable vision applications and flexible task reformulations.
Contribution
It proposes a novel discrete-state modeling approach that unifies different generative paradigms and extends their application to discriminative tasks like image segmentation.
Findings
Achieved state-of-the-art results on ImageNet256.
Demonstrated flexible conditional sampling with a single trained model.
Unified framework bridging Masked Generative and Diffusion models.
Abstract
In generative models, two paradigms have gained attraction in various applications: next-set prediction-based Masked Generative Models and next-noise prediction-based Non-Autoregressive Models, e.g., Diffusion Models. In this work, we propose using discrete-state models to connect them and explore their scalability in the vision domain. First, we conduct a step-by-step analysis in a unified design space across two types of models including timestep-independence, noise schedule, temperature, guidance strength, etc in a scalable manner. Second, we re-cast typical discriminative tasks, e.g., image segmentation, as an unmasking process from [MASK] tokens on a discrete-state model. This enables us to perform various sampling processes, including flexible conditional sampling by only training once to model the joint distribution. All aforementioned explorations lead to our framework named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning
MethodsDiffusion
