RDPM: Solve Diffusion Probabilistic Models via Recurrent Token   Prediction

Xiaoping Wu; Jie Hu; Xiaoming Wei

arXiv:2412.18390·cs.CV·December 30, 2024

RDPM: Solve Diffusion Probabilistic Models via Recurrent Token Prediction

Xiaoping Wu, Jie Hu, Xiaoming Wei

PDF

Open Access

TL;DR

RDPM introduces a novel discrete diffusion framework with recurrent token prediction for high-fidelity image synthesis, enabling efficient multimodal generation by transforming continuous signals into discrete tokens.

Contribution

The paper pioneers Discrete Diffusion with recurrent token prediction, unifying image, video, audio, and text generation within a single diffusion-based model.

Findings

01

RDPM achieves superior image synthesis quality.

02

Requires only a few inference steps for high-quality results.

03

Enables unified multimodal generation across different data types.

Abstract

Diffusion Probabilistic Models (DPMs) have emerged as the de facto approach for high-fidelity image synthesis, operating diffusion processes on continuous VAE latent, which significantly differ from the text generation methods employed by Large Language Models (LLMs). In this paper, we introduce a novel generative framework, the Recurrent Diffusion Probabilistic Model (RDPM), which enhances the diffusion process through a recurrent token prediction mechanism, thereby pioneering the field of Discrete Diffusion. By progressively introducing Gaussian noise into the latent representations of images and encoding them into vector-quantized tokens in a recurrent manner, RDPM facilitates a unique diffusion process on discrete-value domains. This process iteratively predicts the token codes for subsequent timesteps, transforming the initial standard Gaussian noise into the source data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Advanced Database Systems and Queries · Topic Modeling

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion