TL;DR
This paper introduces a method for training cascaded diffusion models as likelihood models using hierarchical volume-preserving maps, enabling tractable likelihood evaluation and improving performance on various benchmarks.
Contribution
It proposes modeling diffusion processes on latent spaces with volume-preserving transforms like Laplacian pyramids and wavelets, allowing direct likelihood computation and enhanced results.
Findings
Improved likelihood estimation for high-resolution generative models
Enhanced performance in density estimation, compression, and out-of-distribution detection
Theoretical link to score matching under Earth Mover's Distance
Abstract
Cascaded models are multi-scale generative models with a marked capacity for producing perceptually impressive samples at high resolutions. In this work, we show that they can also be excellent likelihood models, so long as we overcome a fundamental difficulty with probabilistic multi-scale models: the intractability of the likelihood function. Chiefly, in cascaded models each intermediary scale introduces extraneous variables that cannot be tractably marginalized out for likelihood evaluation. This issue vanishes by modeling the diffusion process on latent spaces induced by a class of transformations we call hierarchical volume-preserving maps, which decompose spatially structured data in a hierarchical fashion without introducing local distortions in the latent space. We demonstrate that two such maps are well-known in the literature for multiscale modeling: Laplacian pyramids and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion · Laplacian Pyramid
