MVOC: a training-free multiple video object composition method with diffusion models
Wei Wang, Yaosen Chen, Yuegen Liu, Qi Yuan, Shubin Yang, Yanru Zhang

TL;DR
MVOC introduces a training-free diffusion-based method for multi-object video composition that maintains object motion, identity, and interaction effects, outperforming existing approaches.
Contribution
The paper presents a novel training-free diffusion model approach for multi-object video composition that ensures motion and identity consistency and models object interactions.
Findings
Outperforms state-of-the-art methods in video composition tasks.
Maintains object motion and identity in generated videos.
Effectively models interaction effects between objects.
Abstract
Video composition is the core task of video editing. Although image composition based on diffusion models has been highly successful, it is not straightforward to extend the achievement to video object composition tasks, which not only exhibit corresponding interaction effects but also ensure that the objects in the composited video maintain motion and identity consistency, which is necessary to composite a physical harmony video. To address this challenge, we propose a Multiple Video Object Composition (MVOC) method based on diffusion models. Specifically, we first perform DDIM inversion on each video object to obtain the corresponding noise features. Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video. Finally, we use the image-to-video generation model to composite the video with feature and attention injections in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax · Attention Is All You Need · Diffusion
