Adaptive Destruction Processes for Diffusion Samplers

Timofei Gritsaev; Nikita Morozov; Kirill Tamogashev; Daniil Tiapkin; Sergey Samsonov; Alexey Naumov; Dmitry Vetrov; Nikolay Malkin

arXiv:2506.01541·cs.LG·June 3, 2025

Adaptive Destruction Processes for Diffusion Samplers

Timofei Gritsaev, Nikita Morozov, Kirill Tamogashev, Daniil Tiapkin, Sergey Samsonov, Alexey Naumov, Dmitry Vetrov, Nikolay Malkin

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a trainable, discrete-time diffusion process with adaptive destruction mechanisms, enhancing sampling efficiency and quality in generative models by decoupling generation and destruction variances.

Contribution

It proposes a flexible, trainable diffusion sampler that decouples generation and destruction variances, improving convergence and sample quality with fewer steps.

Findings

01

Faster convergence with limited steps.

02

Improved sampling quality on benchmarks.

03

Scalability demonstrated on GAN latent spaces.

Abstract

This paper explores the challenges and benefits of a trainable destruction process in diffusion samplers -- diffusion-based generative models trained to sample an unnormalised density without access to data samples. Contrary to the majority of work that views diffusion samplers as approximations to an underlying continuous-time model, we view diffusion models as discrete-time policies trained to produce samples in very few generation steps. We propose to trade some of the elegance of the underlying theory for flexibility in the definition of the generative and destruction policies. In particular, we decouple the generation and destruction variances, enabling both transition kernels to be learned as unconstrained Gaussian densities. We show that, when the number of steps is limited, training both generation and destruction processes results in faster convergence and improved sampling…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. The problem studies is novel and interesting 2. The proposed training objective, Second-Moment Divergence, deviates from the standard KL formulation, which is an interesting direction to explore. 3. The design space is meticulously swept over, with the key ingredients for stable training reported in this paper. 4. The advantage is most pronounced when the number of sampling steps is small. The paper shows that on some tasks, its method with as few as 5 steps can outperform 20-step samplers

Weaknesses

1. Despite the non-trivial efforts to stabilize the training, the joint training process is inherently unstable. The Trajectory Likelihood Maximization (TLM) objective, one of the main candidates for generically training the destruction process, is "unstable and often leads to divergent training" when the number of steps is large. 2. The method also seems to be highly sensitive to hyperparameters. L265-266, "tuning relative learning rates is critical for stable training". 3. The results on GAN

Reviewer 02Rating 6Confidence 2

Strengths

1. Novel joint training of generation and destruction processes in diffusion samplers, enabling improved convergence and sampling quality, especially in few-step regimes. 2. Flexible design with state-dependent, decoupled variances for both processes—only possible in discrete-time formulation—leading to enhanced adaptability to complex energy landscapes.

Weaknesses

1. Limited visual results: The paper presents few qualitative or visual examples (only human faces in Fig. 4), making it difficult to fully assess sampling quality, especially in image-related tasks. 2. No discussion of limitations: The paper lacks a section acknowledging potential limitations (e.g., scalability to more complex distributions, sensitivity to architecture choices), which raises concerns about generalizability.

Reviewer 03Rating 4Confidence 3

Strengths

1. Framework novelty. The method firstly extend the traditional diffusion process into learnable variances in an unified theoretical framework. 2. Integration of stability mechanism. The paper involved reinforcement-learning stabilization tools inspired by reinforcement learning's view. And Table 2 systematically evaluate the performance of each tool. 3. Scalability to high-dimensional tasks. Section 4.4 demonstrated the capability of the method to higher dimension image generation tasks, wh

Weaknesses

1. Insufficient theoretical analysis. Although there is unified framework and well-defined processes, no analysis of the convergence or gradient bias of KL divergence is provided. 2. Lack of continuous-time analysis. There is no proof for the equivalence between the generation and th destruction processes as T goes to infinity. 3. Limited evaluation to TLM. The paper proposed TB and TLM, but the main experiments were conducted by TB.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare