RDPM: Solve Diffusion Probabilistic Models via Recurrent Token Prediction
Xiaoping Wu, Jie Hu, Xiaoming Wei

TL;DR
RDPM introduces a novel discrete diffusion framework with recurrent token prediction for high-fidelity image synthesis, enabling efficient multimodal generation by transforming continuous signals into discrete tokens.
Contribution
The paper pioneers Discrete Diffusion with recurrent token prediction, unifying image, video, audio, and text generation within a single diffusion-based model.
Findings
RDPM achieves superior image synthesis quality.
Requires only a few inference steps for high-quality results.
Enables unified multimodal generation across different data types.
Abstract
Diffusion Probabilistic Models (DPMs) have emerged as the de facto approach for high-fidelity image synthesis, operating diffusion processes on continuous VAE latent, which significantly differ from the text generation methods employed by Large Language Models (LLMs). In this paper, we introduce a novel generative framework, the Recurrent Diffusion Probabilistic Model (RDPM), which enhances the diffusion process through a recurrent token prediction mechanism, thereby pioneering the field of Discrete Diffusion. By progressively introducing Gaussian noise into the latent representations of images and encoding them into vector-quantized tokens in a recurrent manner, RDPM facilitates a unique diffusion process on discrete-value domains. This process iteratively predicts the token codes for subsequent timesteps, transforming the initial standard Gaussian noise into the source data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Topic Modeling
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
