Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective

Bolin Lai; Xudong Wang; Saketh Rambhatla; James M. Rehg; Zsolt Kira; Rohit Girdhar; Ishan Misra

arXiv:2511.22249·cs.CV·December 1, 2025

Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective

Bolin Lai, Xudong Wang, Saketh Rambhatla, James M. Rehg, Zsolt Kira, Rohit Girdhar, Ishan Misra

PDF

Open Access

TL;DR

This paper identifies the frequency-related challenges in high-dimensional latent spaces for diffusion models and introduces FreqWarm, a curriculum that improves generation quality without retraining autoencoders.

Contribution

We propose FreqWarm, a frequency warm-up curriculum that enhances high-frequency exposure during training, improving diffusion-based generation quality across various autoencoders.

Findings

01

FreqWarm reduces gFID scores significantly across multiple autoencoders.

02

High-frequency latent components are crucial for detailed generation.

03

Managing frequency exposure improves diffusibility of high-dimensional latent spaces.

Abstract

Latent diffusion has become the default paradigm for visual generation, yet we observe a persistent reconstruction-generation trade-off as latent dimensionality increases: higher-capacity autoencoders improve reconstruction fidelity but generation quality eventually declines. We trace this gap to the different behaviors in high-frequency encoding and decoding. Through controlled perturbations in both RGB and latent domains, we analyze encoder/decoder behaviors and find that decoders depend strongly on high-frequency latent components to recover details, whereas encoders under-represent high-frequency contents, yielding insufficient exposure and underfitting in high-frequency bands for diffusion model training. To address this issue, we introduce FreqWarm, a plug-and-play frequency warm-up curriculum that increases early-stage exposure to high-frequency latent signals during diffusion or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Face recognition and analysis