DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning

Dongxu Liu; Jiahui Zhu; Yuang Peng; Haomiao Tang; Yuwei Chen; Chunrui Han; Zheng Ge; Daxin Jiang; Mingxue Liao

arXiv:2506.09644·cs.CV·January 14, 2026

DGAE: Diffusion-Guided Autoencoder for Efficient Latent Representation Learning

Dongxu Liu, Jiahui Zhu, Yuang Peng, Haomiao Tang, Yuwei Chen, Chunrui Han, Zheng Ge, Daxin Jiang, Mingxue Liao

PDF

Open Access

TL;DR

DGAE introduces a diffusion-guided autoencoder that enhances latent space efficiency and stability, achieving high-quality image reconstruction with smaller latent dimensions and faster diffusion model convergence.

Contribution

The paper presents DGAE, a novel autoencoder that uses diffusion guidance to improve decoder expressiveness and reduce latent space size, addressing training instability and performance degradation.

Findings

01

Mitigates performance loss under high compression

02

Achieves state-of-the-art results with 2x smaller latent space

03

Facilitates faster convergence of diffusion models

Abstract

Autoencoders empower state-of-the-art image and video generative models by compressing pixels into a latent space through visual tokenization. Although recent advances have alleviated the performance degradation of autoencoders under high compression ratios, addressing the training instability caused by GAN remains an open challenge. While improving spatial compression, we also aim to minimize the latent space dimensionality, enabling more efficient and compact representations. To tackle these challenges, we focus on improving the decoder's expressiveness. Concretely, we propose DGAE, which employs a diffusion model to guide the decoder in recovering informative signals that are not fully decoded from the latent representation. With this design, DGAE effectively mitigates the performance degradation under high spatial compression rates. At the same time, DGAE achieves state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Domain Adaptation and Few-Shot Learning