Motion Consistency Model: Accelerating Video Diffusion with Disentangled   Motion-Appearance Distillation

Yuanhao Zhai; Kevin Lin; Zhengyuan Yang; Linjie Li; Jianfeng Wang,; Chung-Ching Lin; David Doermann; Junsong Yuan; Lijuan Wang

arXiv:2406.06890·cs.CV·October 29, 2024

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang,, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces the Motion Consistency Model (MCM), a novel single-stage video diffusion distillation approach that disentangles motion and appearance learning, leveraging high-quality image data to improve frame quality and achieve state-of-the-art results.

Contribution

The paper proposes a disentangled motion distillation and mixed trajectory distillation method to enhance video diffusion quality and address training-inference discrepancies.

Findings

01

Achieves state-of-the-art video diffusion distillation performance.

02

Enhances frame quality with high aesthetic scores or specific styles.

03

Effectively leverages high-quality image data for video frame enhancement.

Abstract

Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data. We propose motion consistency model (MCM), a single-stage video diffusion distillation method that disentangles motion and appearance learning. Specifically, MCM includes a video consistency model that distills motion from the video teacher model, and an image discriminator that enhances frame appearance to match high-quality image data. This combination presents two challenges: (1) conflicting frame learning objectives, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yhZhai/mcm
pytorchOfficial

Models

🤗
yhzhai/mcm
model· ♡ 6
♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment

MethodsDiffusion