Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics

Sebastian Sanokowski; Wilhelm Berghammer; Martin Ennemoser; Haoyu Peter Wang; Sepp Hochreiter; Sebastian Lehner

arXiv:2502.08696·cs.LG·July 9, 2025

Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics

Sebastian Sanokowski, Wilhelm Berghammer, Martin Ennemoser, Haoyu Peter Wang, Sepp Hochreiter, Sebastian Lehner

PDF

Open Access 3 Reviews

TL;DR

This paper introduces scalable, memory-efficient discrete diffusion samplers with novel training methods, enabling unbiased sampling and outperforming existing approaches in combinatorial optimization and statistical physics applications.

Contribution

The paper presents two new training techniques for discrete diffusion models, improving scalability and enabling unbiased sampling in complex discrete domains.

Findings

01

Outperforms autoregressive methods on Ising model benchmarks

02

Achieves state-of-the-art results in combinatorial optimization

03

Enables unbiased sampling with adapted diffusion models

Abstract

Learning to sample from complex unnormalized distributions over discrete domains emerged as a promising research direction with applications in statistical physics, variational inference, and combinatorial optimization. Recent work has demonstrated the potential of diffusion models in this domain. However, existing methods face limitations in memory scaling and thus the number of attainable diffusion steps since they require backpropagation through the entire generative process. To overcome these limitations we introduce two novel training methods for discrete diffusion samplers, one grounded in the policy gradient theorem and the other one leveraging Self-Normalized Neural Importance Sampling (SN-NIS). These methods yield memory-efficient training and achieve state-of-the-art results in unsupervised combinatorial optimization. Numerous scientific applications additionally require the…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

* The innovative approach to computing the forward and reverse KL loss functions is the paper's main strength. Developing memory-efficient loss functions is particularly important for diffusion models. * The paper is clearly written and all the results are well presented.

Weaknesses

* The empirical validation of the proposed algorithms is insufficient to convincingly demonstrate their practical benefits. It would be more compelling if the authors included comparisons to standard diffusion-based samplers under the same computational constraints or memory budget in the Ising model experiments. The current comparisons against the AR baselines do not adequately highlight the potential advantages of the proposed methods over conventional diffusion-based samplers. * The experime

Reviewer 02Rating 6Confidence 3

Strengths

- The paper introduces two novel methods (RL-based reverse KL and SN-NIS-based forward KL) to address memory scaling issues, enabling the use of more diffusion steps while staying within fixed memory constraints. - By extending importance sampling and Markov Chain Monte Carlo methods to diffusion models, the authors enable unbiased sampling—a critical requirement for scientific applications that require accurate expectation estimates.

Weaknesses

- Although the paper reports inference time alongside results, a more comprehensive analysis of computational costs is needed, particularly in terms of training time and overall efficiency. - The paper would benefit from a deeper theoretical exploration of the convergence properties of the proposed methods, especially the RL-based approach. Clarifying the conditions under which convergence is guaranteed would strengthen the paper's contributions. - The experiments lack ablation studies to isolat

Reviewer 03Rating 6Confidence 2

Strengths

The paper provides two different ways of solving the memory issue in the context of discrete diffusion model when using the reverse KL. A reasonable list of optimization's benchmarks are provided.

Weaknesses

The benchmark of sampling on Ising is ok, but we could have expected more difficult cases. For instance, the 2D spin glass ($J= \pm 1$) might be * a more challenging task not too different from the Ising case considered in the paper * and for which the polynomial algorithm exists at least for the ground states. In relation to both their work and this case, they can also refer to the following article: https://arxiv.org/pdf/2407.19483 which study using a different approach the sampling on the 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis

MethodsDiffusion