VDOT: Efficient Unified Video Creation via Optimal Transport Distillation

Yutong Wang; Haiyu Zhang; Tianfan Xue; Yu Qiao; Yaohui Wang; Chang Xu; Xinyuan Chen

arXiv:2512.06802·cs.CV·December 23, 2025

VDOT: Efficient Unified Video Creation via Optimal Transport Distillation

Yutong Wang, Haiyu Zhang, Tianfan Xue, Yu Qiao, Yaohui Wang, Chang Xu, Xinyuan Chen

PDF

Open Access

TL;DR

VDOT introduces an efficient, unified video creation model using optimal transport distillation, significantly reducing generation time while maintaining high quality, and providing a standardized benchmark for evaluation.

Contribution

The paper proposes a novel optimal transport-based distillation method for unified video creation, improving efficiency and stability over traditional KL-based approaches.

Findings

01

VDOT outperforms baselines with fewer steps

02

Achieves comparable quality with 100 steps using only 4 steps

03

Provides a new benchmark for unified video creation evaluation

Abstract

The rapid development of generative models has significantly advanced image and video applications. Among these, video creation, aimed at generating videos under various conditions, has gained substantial attention. However, existing video creation models either focus solely on a few specific conditions or suffer from excessively long generation times due to complex model inference, making them impractical for real-world applications. To mitigate these issues, we propose an efficient unified video creation model, named VDOT. Concretely, we model the training process with the distribution matching distillation (DMD) paradigm. Instead of using the Kullback-Leibler (KL) minimization, we additionally employ a novel computational optimal transport (OT) technique to optimize the discrepancy between the real and fake score distributions. The OT distance inherently imposes geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Visual Attention and Saliency Detection