VideoGuide: Improving Video Diffusion Models without Training Through a   Teacher's Guide

Dohun Lee; Bryan S Kim; Geon Yeong Park; Jong Chul Ye

arXiv:2410.04364·cs.CV·December 10, 2024

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Dohun Lee, Bryan S Kim, Geon Yeong Park, Jong Chul Ye

PDF

Open Access

TL;DR

VideoGuide enhances the temporal consistency of pretrained text-to-video diffusion models during inference without additional training, by guiding the denoising process with a pretrained model, leading to better video quality and coherence.

Contribution

It introduces a training-free framework that improves temporal consistency in T2V models by leveraging pretrained models as guides during inference.

Findings

01

Significant improvement in temporal consistency and image fidelity.

02

Cost-effective method without additional training or fine-tuning.

03

Enhanced text coherence via prior distillation.

Abstract

Text-to-image (T2I) diffusion models have revolutionized visual content creation, but extending these capabilities to text-to-video (T2V) generation remains a challenge, particularly in preserving temporal consistency. Existing methods that aim to improve consistency often cause trade-offs such as reduced imaging quality and impractical computational time. To address these issues we introduce VideoGuide, a novel framework that enhances the temporal consistency of pretrained T2V models without the need for additional training or fine-tuning. Instead, VideoGuide leverages any pretrained video diffusion model (VDM) or itself as a guide during the early stages of inference, improving temporal quality by interpolating the guiding model's denoised samples into the sampling model's denoising process. The proposed method brings about significant improvement in temporal consistency and image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimedia Communication and Technology

MethodsBalanced Selection · Diffusion