InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models
Nirat Saini, Navaneeth Bodla, Ashish Shrivastava, Avinash, Ravichandran, Xiao Zhang, Abhinav Shrivastava, Bharat Singh

TL;DR
InVi is a novel video editing method that seamlessly inserts objects into videos using off-the-shelf diffusion models, ensuring high-quality, temporally coherent results without requiring video-specific fine-tuning.
Contribution
The paper introduces InVi, a new approach that combines inpainting and extended-attention diffusion models for realistic, coherent object insertion in videos without fine-tuning.
Findings
InVi produces realistic object insertions with seamless blending.
The method maintains temporal coherence across video frames.
Outperforms existing video editing techniques in quality and efficiency.
Abstract
We introduce InVi, an approach for inserting or replacing objects within videos (referred to as inpainting) using off-the-shelf, text-to-image latent diffusion models. InVi targets controlled manipulation of objects and blending them seamlessly into a background video unlike existing video editing methods that focus on comprehensive re-styling or entire scene alterations. To achieve this goal, we tackle two key challenges. Firstly, for high quality control and blending, we employ a two-step process involving inpainting and matching. This process begins with inserting the object into a single frame using a ControlNet-based inpainting diffusion model, and then generating subsequent frames conditioned on features from an inpainted frame as an anchor to minimize the domain gap between the background and the object. Secondly, to ensure temporal coherence, we replace the diffusion model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Image and Video Retrieval Techniques
MethodsFocus · Diffusion · Inpainting
