Distillation of Discrete Diffusion through Dimensional Correlations

Satoshi Hayakawa; Yuhta Takida; Masaaki Imaizumi; Hiromi Wakaki; Yuki Mitsufuji

arXiv:2410.08709·cs.LG·May 12, 2025

Distillation of Discrete Diffusion through Dimensional Correlations

Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi, Hiromi Wakaki, Yuki Mitsufuji

PDF

Open Access 1 Repo

TL;DR

This paper introduces scalable mixture models for discrete diffusion that capture dimensional correlations, enabling distillation of slow, iterative models into faster, few-step versions for image and language generation.

Contribution

It proposes novel mixture models and loss functions for distilling discrete diffusion models, effectively reducing sampling steps while preserving quality.

Findings

01

Effective distillation of pretrained models in image and language domains

02

Significant reduction in sampling steps with maintained performance

03

Scalable approach for modeling dependencies in high-dimensional discrete data

Abstract

Diffusion models have demonstrated exceptional performances in various fields of generative modeling, but suffer from slow sampling speed due to their iterative nature. While this issue is being addressed in continuous domains, discrete diffusion models face unique challenges, particularly in capturing dependencies between elements (e.g., pixel relationships in image, sequential dependencies in language) mainly due to the computational cost of processing high-dimensional joint distributions. In this paper, (i) we propose "mixture" models for discrete diffusion that are capable of treating dimensional correlations while remaining scalable, and (ii) we provide a set of loss functions for distilling the iterations of existing models. Two primary theoretical insights underpin our approach: First, conventional models with element-wise independence can well approximate the data distribution,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sony/di4c
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration

MethodsSparse Evolutionary Training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion