Controllable Video Object Insertion via Multiview Priors
Xia Qi, Peishan Cong, Yichen Yao, Ziyi Wang, Yaoqin Ye, Yuexin Ma

TL;DR
This paper introduces a novel method for video object insertion that uses multiview priors and view-consistent conditioning to improve appearance, occlusion handling, and temporal coherence.
Contribution
It proposes a new framework that lifts 2D images into multiview representations and employs modules for spatial realism and temporal consistency, addressing key challenges in video object insertion.
Findings
Significantly improves the quality of inserted objects in videos.
Ensures stable identity guidance and robust integration across viewpoints.
Effectively handles occlusion and boundary artifacts while maintaining temporal continuity.
Abstract
Video object insertion is a critical task for dynamically inserting new objects into existing environments. Previous video generation methods focus primarily on synthesizing entire scenes while struggling with ensuring consistent object appearance, spatial alignment, and temporal coherence when inserting objects into existing videos. In this paper, we propose a novel solution for Video Object Insertion, which integrates multi-view object priors to address the common challenges of appearance inconsistency and occlusion handling in dynamic environments. By lifting 2D reference images into multi-view representations and leveraging a dual-path view-consistent conditioning mechanism, our framework ensures stable identity guidance and robust integration across diverse viewpoints. A quality-aware weighting mechanism is also employed to adaptively handle noisy or imperfect inputs. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
