Improving the Diffusability of Autoencoders

Ivan Skorokhodov; Sharath Girish; Benran Hu; Willi Menapace; Yanyu Li; Rameen Abdal; Sergey Tulyakov; Aliaksandr Siarohin

arXiv:2502.14831·cs.CV·June 10, 2025

Improving the Diffusability of Autoencoders

Ivan Skorokhodov, Sharath Girish, Benran Hu, Willi Menapace, Yanyu Li, Rameen Abdal, Sergey Tulyakov, Aliaksandr Siarohin

PDF

Open Access 1 Video

TL;DR

This paper identifies high-frequency components in autoencoder latent spaces that hinder diffusion quality and proposes a simple regularization method to improve image and video generation performance.

Contribution

It introduces scale equivariance regularization to autoencoders, significantly enhancing diffusion-based image and video synthesis quality with minimal fine-tuning.

Findings

01

Reduces FID by 19% on ImageNet-1K 256^2 images.

02

Decreases FVD by at least 44% on Kinetics-700 videos.

03

Identifies high-frequency interference in autoencoder latent spaces.

Abstract

Latent diffusion models have emerged as the leading approach for generating high-quality images and videos, utilizing compressed latent representations to reduce the computational burden of the diffusion process. While recent advancements have primarily focused on scaling diffusion backbones and improving autoencoder reconstruction quality, the interaction between these components has received comparatively less attention. In this work, we perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces, which are especially pronounced in the autoencoders with a large bottleneck channel size. We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality. To mitigate the issue, we propose scale equivariance: a simple regularization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Improving the Diffusability of Autoencoders· slideslive

Taxonomy

TopicsStatistical and Computational Modeling

MethodsDiffusion