Latent Diffusion Models with Masked AutoEncoders

Junho Lee; Jeongwoo Shin; Hyungwook Choi; Joonseok Lee

arXiv:2507.09984·cs.CV·October 23, 2025

Latent Diffusion Models with Masked AutoEncoders

Junho Lee, Jeongwoo Shin, Hyungwook Choi, Joonseok Lee

PDF

Open Access 1 Models

TL;DR

This paper investigates the properties of autoencoders in Latent Diffusion Models, identifies key limitations, and proposes Variational Masked AutoEncoders (VMAEs) to improve image generation quality.

Contribution

It introduces VMAEs that leverage hierarchical features to enhance LDM autoencoders, addressing the lack of simultaneous property satisfaction in existing methods.

Findings

01

VMAEs improve latent smoothness and perceptual quality.

02

Integration of VMAEs enhances image generation performance.

03

The proposed framework outperforms existing autoencoder designs in LDMs.

Abstract

In spite of the remarkable potential of Latent Diffusion Models (LDMs) in image generation, the desired properties and optimal design of the autoencoders have been underexplored. In this work, we analyze the role of autoencoders in LDMs and identify three key properties: latent smoothness, perceptual compression quality, and reconstruction quality. We demonstrate that existing autoencoders fail to simultaneously satisfy all three properties, and propose Variational Masked AutoEncoders (VMAEs), taking advantage of the hierarchical features maintained by Masked AutoEncoders. We integrate VMAEs into the LDM framework, introducing Latent Diffusion Models with Masked AutoEncoders (LDMAEs). Our code is available at https://github.com/isno0907/ldmae.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
isno0907/ldmae
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling