Edit as You See: Image-guided Video Editing via Masked Motion Modeling

Zhi-Lin Huang; Yixuan Liu; Chujun Qin; Zhongdao Wang; Dong Zhou; Dong; Li; Emad Barsoum

arXiv:2501.04325·cs.CV·January 9, 2025

Edit as You See: Image-guided Video Editing via Masked Motion Modeling

Zhi-Lin Huang, Yixuan Liu, Chujun Qin, Zhongdao Wang, Dong Zhou, Dong, Li, Emad Barsoum

PDF

Open Access

TL;DR

This paper introduces IVEDiff, a novel image-guided video editing diffusion model that maintains temporal consistency and high quality in edited videos using masked motion modeling and optical flow guidance.

Contribution

The paper proposes a new image-guided video editing model with learnable motion modules and a masked motion modeling strategy, enhancing temporal consistency and editing accuracy.

Findings

01

Produces temporally smooth edited videos

02

Effectively handles various editing objects

03

Maintains high editing quality

Abstract

Recent advancements in diffusion models have significantly facilitated text-guided video editing. However, there is a relative scarcity of research on image-guided video editing, a method that empowers users to edit videos by merely indicating a target object in the initial frame and providing an RGB image as reference, without relying on the text prompts. In this paper, we propose a novel Image-guided Video Editing Diffusion model, termed IVEDiff for the image-guided video editing. IVEDiff is built on top of image editing models, and is equipped with learnable motion modules to maintain the temporal consistency of edited video. Inspired by self-supervised learning concepts, we introduce a masked motion modeling fine-tuning strategy that empowers the motion module's capabilities for capturing inter-frame motion dynamics, while preserving the capabilities for intra-frame semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Human Motion and Animation · Video Analysis and Summarization

MethodsDiffusion · Balanced Selection