Transition Matching Distillation for Fast Video Generation
Weili Nie, Julius Berner, Nanye Ma, Chao Liu, Saining Xie, Arash Vahdat

TL;DR
This paper introduces Transition Matching Distillation (TMD), a framework that distills large video diffusion models into efficient few-step generators, enabling faster video synthesis with maintained quality.
Contribution
The paper proposes a novel TMD framework that matches multi-step denoising trajectories with few-step transitions using lightweight conditional flows, improving speed and quality in video generation.
Findings
TMD achieves a strong speed-quality trade-off in video generation.
TMD outperforms existing models at similar inference costs.
The method effectively distills large diffusion models into efficient generators.
Abstract
Large video diffusion and flow models have achieved remarkable success in high-quality video generation, but their use in real-time interactive applications remains limited due to their inefficient multi-step sampling process. In this work, we present Transition Matching Distillation (TMD), a novel framework for distilling video diffusion models into efficient few-step generators. The central idea of TMD is to match the multi-step denoising trajectory of a diffusion model with a few-step probability transition process, where each transition is modeled as a lightweight conditional flow. To enable efficient distillation, we decompose the original diffusion backbone into two components: (1) a main backbone, comprising the majority of early layers, that extracts semantic representations at each outer transition step; and (2) a flow head, consisting of the last few layers, that leverages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging
