Nested Diffusion Models Using Hierarchical Latent Priors
Xiao Zhang, Ruoxi Jiang, Rebecca Willett, Michael Maire

TL;DR
This paper presents nested diffusion models that use hierarchical latent priors to improve image generation quality, especially for complex scenes, by progressively generating and conditioning on semantic-level latent variables.
Contribution
The paper introduces a hierarchical diffusion framework with semantic latent variables, leveraging a pre-trained encoder, to enhance image quality with minimal additional computational cost.
Findings
Significant improvement in image quality across multiple datasets.
Outperforms baseline conditional systems in unconditional generation.
Efficient hierarchical approach with low overhead.
Abstract
We introduce nested diffusion models, an efficient and powerful hierarchical generative framework that substantially enhances the generation quality of diffusion models, particularly for images of complex scenes. Our approach employs a series of diffusion models to progressively generate latent variables at different semantic levels. Each model in this series is conditioned on the output of the preceding higher-level models, culminating in image generation. Hierarchical latent variables guide the generation process along predefined semantic pathways, allowing our approach to capture intricate structural details while significantly improving image quality. To construct these latent variables, we leverage a pre-trained visual encoder, which learns strong semantic visual representations, and modulate its capacity via dimensionality reduction and noise injection. Across multiple datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsDiffusion
