Adaptive Destruction Processes for Diffusion Samplers
Timofei Gritsaev, Nikita Morozov, Kirill Tamogashev, Daniil Tiapkin, Sergey Samsonov, Alexey Naumov, Dmitry Vetrov, Nikolay Malkin

TL;DR
This paper introduces a trainable, discrete-time diffusion process with adaptive destruction mechanisms, enhancing sampling efficiency and quality in generative models by decoupling generation and destruction variances.
Contribution
It proposes a flexible, trainable diffusion sampler that decouples generation and destruction variances, improving convergence and sample quality with fewer steps.
Findings
Faster convergence with limited steps.
Improved sampling quality on benchmarks.
Scalability demonstrated on GAN latent spaces.
Abstract
This paper explores the challenges and benefits of a trainable destruction process in diffusion samplers -- diffusion-based generative models trained to sample an unnormalised density without access to data samples. Contrary to the majority of work that views diffusion samplers as approximations to an underlying continuous-time model, we view diffusion models as discrete-time policies trained to produce samples in very few generation steps. We propose to trade some of the elegance of the underlying theory for flexibility in the definition of the generative and destruction policies. In particular, we decouple the generation and destruction variances, enabling both transition kernels to be learned as unconstrained Gaussian densities. We show that, when the number of steps is limited, training both generation and destruction processes results in faster convergence and improved sampling…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The problem studies is novel and interesting 2. The proposed training objective, Second-Moment Divergence, deviates from the standard KL formulation, which is an interesting direction to explore. 3. The design space is meticulously swept over, with the key ingredients for stable training reported in this paper. 4. The advantage is most pronounced when the number of sampling steps is small. The paper shows that on some tasks, its method with as few as 5 steps can outperform 20-step samplers
1. Despite the non-trivial efforts to stabilize the training, the joint training process is inherently unstable. The Trajectory Likelihood Maximization (TLM) objective, one of the main candidates for generically training the destruction process, is "unstable and often leads to divergent training" when the number of steps is large. 2. The method also seems to be highly sensitive to hyperparameters. L265-266, "tuning relative learning rates is critical for stable training". 3. The results on GAN
1. Novel joint training of generation and destruction processes in diffusion samplers, enabling improved convergence and sampling quality, especially in few-step regimes. 2. Flexible design with state-dependent, decoupled variances for both processes—only possible in discrete-time formulation—leading to enhanced adaptability to complex energy landscapes.
1. Limited visual results: The paper presents few qualitative or visual examples (only human faces in Fig. 4), making it difficult to fully assess sampling quality, especially in image-related tasks. 2. No discussion of limitations: The paper lacks a section acknowledging potential limitations (e.g., scalability to more complex distributions, sensitivity to architecture choices), which raises concerns about generalizability.
1. Framework novelty. The method firstly extend the traditional diffusion process into learnable variances in an unified theoretical framework. 2. Integration of stability mechanism. The paper involved reinforcement-learning stabilization tools inspired by reinforcement learning's view. And Table 2 systematically evaluate the performance of each tool. 3. Scalability to high-dimensional tasks. Section 4.4 demonstrated the capability of the method to higher dimension image generation tasks, wh
1. Insufficient theoretical analysis. Although there is unified framework and well-defined processes, no analysis of the convergence or gradient bias of KL divergence is provided. 2. Lack of continuous-time analysis. There is no proof for the equivalence between the generation and th destruction processes as T goes to infinity. 3. Limited evaluation to TLM. The paper proposed TB and TLM, but the main experiments were conducted by TB.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare
