MeDM: Mediating Image Diffusion Models for Video-to-Video Translation   with Temporal Correspondence Guidance

Ernie Chu; Tzuhsuan Huang; Shuo-Yen Lin; Jun-Cheng Chen

arXiv:2308.10079·cs.CV·December 21, 2023

MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

Ernie Chu, Tzuhsuan Huang, Shuo-Yen Lin, Jun-Cheng Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

MeDM is a novel method that leverages pre-trained image diffusion models for consistent, efficient video-to-video translation and editing, ensuring temporal coherence without fine-tuning or test-time optimization.

Contribution

It introduces a framework that enforces temporal consistency using optical flows and physical constraints, compatible with existing diffusion models, without requiring additional training.

Findings

01

Achieves high-quality, temporally consistent video translation.

02

Outperforms existing methods on various benchmarks.

03

Enables text-guided video editing without fine-tuning.

Abstract

This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observation-space scores in latent Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jwliao1209/diffqrcode
pytorch

Videos

MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance· underline

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Cancer-related molecular mechanisms research

MethodsDiffusion