InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

Nirat Saini; Navaneeth Bodla; Ashish Shrivastava; Avinash; Ravichandran; Xiao Zhang; Abhinav Shrivastava; Bharat Singh

arXiv:2407.10958·cs.CV·July 16, 2024

InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

Nirat Saini, Navaneeth Bodla, Ashish Shrivastava, Avinash, Ravichandran, Xiao Zhang, Abhinav Shrivastava, Bharat Singh

PDF

Open Access

TL;DR

InVi is a novel video editing method that seamlessly inserts objects into videos using off-the-shelf diffusion models, ensuring high-quality, temporally coherent results without requiring video-specific fine-tuning.

Contribution

The paper introduces InVi, a new approach that combines inpainting and extended-attention diffusion models for realistic, coherent object insertion in videos without fine-tuning.

Findings

01

InVi produces realistic object insertions with seamless blending.

02

The method maintains temporal coherence across video frames.

03

Outperforms existing video editing techniques in quality and efficiency.

Abstract

We introduce InVi, an approach for inserting or replacing objects within videos (referred to as inpainting) using off-the-shelf, text-to-image latent diffusion models. InVi targets controlled manipulation of objects and blending them seamlessly into a background video unlike existing video editing methods that focus on comprehensive re-styling or entire scene alterations. To achieve this goal, we tackle two key challenges. Firstly, for high quality control and blending, we employ a two-step process involving inpainting and matching. This process begins with inserting the object into a single frame using a ControlNet-based inpainting diffusion model, and then generating subsequent frames conditioned on features from an inpainted frame as an anchor to minimize the domain gap between the background and the object. Secondly, to ensure temporal coherence, we replace the diffusion model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Image and Video Retrieval Techniques

MethodsFocus · Diffusion · Inpainting