Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang,, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

TL;DR
This paper introduces the Motion Consistency Model (MCM), a novel single-stage video diffusion distillation approach that disentangles motion and appearance learning, leveraging high-quality image data to improve frame quality and achieve state-of-the-art results.
Contribution
The paper proposes a disentangled motion distillation and mixed trajectory distillation method to enhance video diffusion quality and address training-inference discrepancies.
Findings
Achieves state-of-the-art video diffusion distillation performance.
Enhances frame quality with high aesthetic scores or specific styles.
Effectively leverages high-quality image data for video frame enhancement.
Abstract
Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data. We propose motion consistency model (MCM), a single-stage video diffusion distillation method that disentangles motion and appearance learning. Specifically, MCM includes a video consistency model that distills motion from the video teacher model, and an image discriminator that enhances frame appearance to match high-quality image data. This combination presents two challenges: (1) conflicting frame learning objectives, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment
MethodsDiffusion
