Transition Matching: Scalable and Flexible Generative Modeling
Neta Shaul, Uriel Singer, Itai Gat, Yaron Lipman

TL;DR
Transition Matching introduces a unified generative framework that combines diffusion, flow, and autoregressive models, enabling flexible, scalable, and high-quality media generation with new variants that improve efficiency and performance.
Contribution
The paper proposes Transition Matching, a novel discrete-time, continuous-state generative paradigm unifying diffusion, flow, and autoregressive models, with three innovative variants demonstrating superior performance.
Findings
DTM achieves state-of-the-art image quality and text adherence.
FHTM matches or surpasses flow-based methods on text-to-image tasks.
TM variants enable flexible, efficient, and high-quality media generation.
Abstract
Diffusion and flow matching models have significantly advanced media generation, yet their design space is well-explored, somewhat limiting further improvements. Concurrently, autoregressive (AR) models, particularly those generating continuous tokens, have emerged as a promising direction for unifying text and media generation. This paper introduces Transition Matching (TM), a novel discrete-time, continuous-state generative paradigm that unifies and advances both diffusion/flow models and continuous AR generation. TM decomposes complex generation tasks into simpler Markov transitions, allowing for expressive non-deterministic probability transition kernels and arbitrary non-continuous supervision processes, thereby unlocking new flexible design avenues. We explore these choices through three TM variants: (i) Difference Transition Matching (DTM), which generalizes flow matching to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Digital Humanities and Scholarship
