DiffEnc: Variational Diffusion with a Learned Encoder

Beatrix M. G. Nielsen; Anders Christensen; Andrea Dittadi; Ole Winther

arXiv:2310.19789·cs.LG·October 20, 2025·1 cites

DiffEnc: Variational Diffusion with a Learned Encoder

Beatrix M. G. Nielsen, Anders Christensen, Andrea Dittadi, Ole Winther

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces DiffEnc, a flexible diffusion model with a learned encoder that improves likelihood performance on CIFAR-10 by incorporating data-dependent means and adjustable noise ratios, offering new theoretical insights.

Contribution

The paper proposes a novel diffusion framework with a learned encoder, data-dependent means, and adjustable noise ratios, enhancing model flexibility and theoretical understanding.

Findings

01

Significant likelihood improvement on CIFAR-10

02

Theoretical insights into ELBO and noise scheduling

03

Flexible diffusion loss with learned encoder

Abstract

Diffusion models may be viewed as hierarchical variational autoencoders (VAEs) with two improvements: parameter sharing for the conditional distributions in the generative process and efficient computation of the loss as independent terms over the hierarchy. We consider two changes to the diffusion model that retain these advantages while adding flexibility to the model. Firstly, we introduce a data- and depth-dependent mean function in the diffusion process, which leads to a modified diffusion loss. Our proposed framework, DiffEnc, achieves a statistically significant improvement in likelihood on CIFAR-10. Secondly, we let the ratio of the noise variance of the reverse encoder process and the generative process be a free weight parameter rather than being fixed to 1. This leads to theoretical insights: For a finite depth hierarchy, the evidence lower bound (ELBO) can be used as an…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

The paper is well written and easy to follow. It adds a simple yet interesting addition to diffusion by introducing a mean shift to the forward diffusion while not being required in the sampling process and therefore ensuring its scalability. The theoretical analysis of the different noise variances adds an interesting flavour too. The results seem to indicate that the added encoder improves the performance in terms of bits per dimension.

Weaknesses

The paper has very limited evaluation and doesn't compare to some of relevant baselines that are even mentioned in the paper, like latent diffusion and only compares on Cifar-10 and MNIST. Furthermore, it mentions that some methods only show improvement after longer training, hinting at potential inconsistencies in the results in case of slightly different training setups due to not training till convergence. It is hard to judge whether the proposed changes are a significant improvement due to t

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 1

Strengths

I believe that implementing a trainable encoder within the context of the diffusion model represents a promising avenue for enhancing diffusion models, particularly in terms of ELBO optimization. This paper offers comprehensive insights into the derivation and analysis, rendering it accessible and straightforward to grasp. The experimental findings are not only persuasive but also harmonize effectively with the theoretical framework. For instance, in Figure 1, we observe a logical outcome indica

Weaknesses

1. The organization of sections is perplexing. It's challenging for me to discern whether Section 2 serves as an introductory section or is meant to highlight one of your contributions. 2. The absence of a central theorem throughout the paper poses a difficulty for readers in anticipating the direction of the derivations and what to expect.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The high-level idea for the paper is quite natural and something that somebody was bound to try because of its potential impact. Overall, I found the writing fairly clear, with some exceptions that I will mention in the next section. The analysis of the method is fairly extensive and supported by lot of details in the appendix, though these are primarily mathematical proofs and not necessarily an exploration of design decisions that have a high level of practical significance.

Weaknesses

The presentation of the method, namely section 3 and 6 could be improved significantly. There are a lot of variable names, and I had to read through the section many time in order to understand what was happening, even though the final procedure is not that complex. Moving the figure provided in Appendix A to the main text might be helpful in this regard. Or you could include an algorithm, or simply a link to your code, as these would all be easier to parse as someone familiar with common diffus

Code & Models

Repositories

bemigini/diffenc
jaxOfficial

Videos

DiffEnc: Variational Diffusion with a Learned Encoder· slideslive

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Topic Modeling

MethodsDiffusion