InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Hoiyeong Jin; Hyojin Jang; Jeongho Kim; Junha Hyung; Kinam Kim; Dongjin Kim; Huijin Choi; Hyeonji Kim; Jaegul Choo

arXiv:2512.17504·cs.CV·December 22, 2025

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Hoiyeong Jin, Hyojin Jang, Jeongho Kim, Junha Hyung, Kinam Kim, Dongjin Kim, Huijin Choi, Hyeonji Kim, Jaegul Choo

PDF

Open Access

TL;DR

InsertAnywhere is a novel framework that combines 4D scene understanding with diffusion models to achieve realistic, geometrically consistent video object insertion with proper occlusion and lighting effects.

Contribution

The paper introduces a 4D aware mask generation module and a diffusion-based video synthesis extension, along with the ROSE++ dataset, advancing realistic video object insertion techniques.

Findings

01

Outperforms existing models in realism and coherence

02

Achieves geometrically plausible object insertions

03

Handles occlusion and lighting effects effectively

Abstract

Recent advances in diffusion-based video generation have opened new possibilities for controllable video editing, yet realistic video object insertion (VOI) remains challenging due to limited 4D scene understanding and inadequate handling of occlusion and lighting effects. We present InsertAnywhere, a new VOI framework that achieves geometrically consistent object placement and appearance-faithful video synthesis. Our method begins with a 4D aware mask generation module that reconstructs the scene geometry and propagates user specified object placement across frames while maintaining temporal coherence and occlusion consistency. Building upon this spatial foundation, we extend a diffusion based video generation model to jointly synthesize the inserted object and its surrounding local variations such as illumination and shading. To enable supervised training, we introduce ROSE++, an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques