Unified Auto-Encoding with Masked Diffusion

Philippe Hansen-Estruch; Sriram Vishwanath; Amy Zhang; Manan Tomar

arXiv:2406.17688·cs.CV·June 26, 2024

Unified Auto-Encoding with Masked Diffusion

Philippe Hansen-Estruch, Sriram Vishwanath, Amy Zhang, Manan Tomar

PDF

Open Access 2 Repos

TL;DR

The paper introduces Unified Masked Diffusion (UMD), a novel auto-encoding framework that combines noise-based and patch-based corruption techniques, enhancing generative and representation learning without heavy data augmentation.

Contribution

UMD unifies diffusion and masked auto-encoder approaches into a single training framework, improving efficiency and performance in various downstream tasks.

Findings

01

Strong performance in generative tasks

02

Effective in representation learning

03

More computationally efficient than prior methods

Abstract

At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarity in their methodologies suggests a promising avenue for an auto-encoder capable of both de-noising tasks. We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD), that combines patch-based and noise-based corruption techniques within a single auto-encoding framework. Specifically, UMD modifies the diffusion transformer (DiT) training process by introducing an additional noise-free, high masking representation step in the diffusion noising schedule, and utilizes a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Filter Design and Implementation · Neural Networks and Applications

MethodsDiffusion