
TL;DR
This paper introduces the Diffusion Encoder, a novel approach that uses diffusion models as encoders to improve the representation learning process, addressing challenges in synchronization with decoders.
Contribution
It proposes an alternating training scheme inspired by EM algorithms to enable diffusion models to serve as effective encoders in autoencoder frameworks.
Findings
Enables diffusion models to act as encoders with reliable synchronization.
Preserves the simple training objective of diffusion models.
Improves the expressiveness of latent representations.
Abstract
We construct a new kind of encoder, leveraging the expressive power of diffusion models. In a traditional variational autoencoder, the encoder and decoder jointly negotiate a latent representation of the input. This is made possible by the reparameterization trick, which simplifies training at the cost of restricting the encoder to a simple family of distributions. Replacing this encoder with a diffusion model requires rethinking how the decoder pressure can be transmitted back to the encoder, given that they tend to update their internal estimates of the latent in opposing directions. We solve this problem with an alternating training scheme, inspired by the expectation-maximization algorithm. Our method enables more reliable synchronization between encoder and decoder, while preserving the simple and efficient training objective of standard diffusion models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
