TL;DR
Neural CTMC introduces a novel approach to discrete diffusion modeling by separately parameterizing jump timing and direction, leading to improved generative performance on language datasets.
Contribution
It proposes a new neural framework that exploits the Poisson structure of CTMCs, factorizing the reverse process into timing and direction components for better modeling.
Findings
Achieves 16.36 perplexity on TinyStories, outperforming previous methods.
Attains the best perplexity on OpenWebText at various sampling steps.
Releases pretrained weights for reproducibility.
Abstract
Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix monolithically -- through proxies such as concrete scores (SEDD) or clean-data predictions (MDLM, GIDD) -- rather than aligning the parameterization with the intrinsic CTMC decomposition into jump timing and jump direction. We propose \textbf{Neural CTMC}, which exploits the underlying Poisson structure of CTMC dynamics by separately parameterizing the reverse process through an \emph{exit rate} (when to jump) and a \emph{jump distribution} (where to jump) via two dedicated network heads. We show that the evidence lower bound (ELBO) reduces to a path-space KL divergence between the true and learned reverse processes that factorizes into a Poisson KL for timing and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
