MVOC: a training-free multiple video object composition method with   diffusion models

Wei Wang; Yaosen Chen; Yuegen Liu; Qi Yuan; Shubin Yang; Yanru Zhang

arXiv:2406.15829·cs.CV·June 25, 2024

MVOC: a training-free multiple video object composition method with diffusion models

Wei Wang, Yaosen Chen, Yuegen Liu, Qi Yuan, Shubin Yang, Yanru Zhang

PDF

Open Access 1 Repo

TL;DR

MVOC introduces a training-free diffusion-based method for multi-object video composition that maintains object motion, identity, and interaction effects, outperforming existing approaches.

Contribution

The paper presents a novel training-free diffusion model approach for multi-object video composition that ensures motion and identity consistency and models object interactions.

Findings

01

Outperforms state-of-the-art methods in video composition tasks.

02

Maintains object motion and identity in generated videos.

03

Effectively models interaction effects between objects.

Abstract

Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only exhibit corresponding interaction effects but also ensure that the objects in the composited video maintain motion and identity consistency, which is necessary to composite a physical harmony video. To address this challenge, we propose a Multiple Video Object Composition (MVOC) method based on diffusion models. Specifically, we first perform DDIM inversion on each video object to obtain the corresponding noise features. Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video. Finally, we use the image-to-video generation model to composite the video with feature and attention injections in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SobeyMIL/MVOC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis

MethodsSoftmax · Attention Is All You Need · Diffusion