DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

Junyu Chen; Dongyun Zou; Wenkun He; Junsong Chen; Enze Xie; Song Han; Han Cai

arXiv:2508.00413·cs.CV·August 4, 2025

DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space

Junyu Chen, Dongyun Zou, Wenkun He, Junsong Chen, Enze Xie, Song Han, Han Cai

PDF

Open Access

TL;DR

DC-AE 1.5 introduces structured latent space and augmented diffusion training to accelerate convergence and improve high-resolution image generation quality in diffusion models, enabling faster training and better results.

Contribution

It proposes a novel structured latent space and augmented training strategy that significantly speeds up convergence and enhances image quality in high-resolution diffusion models.

Findings

01

Faster convergence than previous autoencoders.

02

Higher image quality at increased compression ratios.

03

4x faster generation on ImageNet 512x512.

Abstract

We present DC-AE 1.5, a new family of deep compression autoencoders for high-resolution diffusion models. Increasing the autoencoder's latent channel number is a highly effective approach for improving its reconstruction quality. However, it results in slow convergence for diffusion models, leading to poorer generation quality despite better reconstruction quality. This issue limits the quality upper bound of latent diffusion models and hinders the employment of autoencoders with higher spatial compression ratios. We introduce two key innovations to address this challenge: i) Structured Latent Space, a training-based approach to impose a desired channel-wise structure on the latent space with front latent channels capturing object structures and latter latent channels capturing image details; ii) Augmented Diffusion Training, an augmented diffusion training strategy with additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neuroimaging Techniques and Applications · Advanced Image Processing Techniques