FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors
Chenxi Li, Weijie Wang, Qiang Li, Bruno Lepri, Nicu Sebe, Weizhi Nie

TL;DR
FreeInsert introduces a novel framework for text-guided 3D object insertion that disentangles object generation from spatial placement, eliminating the need for spatial priors and enabling flexible, realistic scene editing.
Contribution
The paper presents FreeInsert, a new method leveraging foundation models to perform unsupervised, spatially precise 3D object insertion guided solely by natural language instructions.
Findings
Achieves semantically coherent 3D insertions without spatial priors.
Ensures spatially precise and visually realistic object placement.
Provides a user-friendly, flexible scene editing process.
Abstract
Text-driven object insertion in 3D scenes is an emerging task that enables intuitive scene editing through natural language. However, existing 2D editing-based methods often rely on spatial priors such as 2D masks or 3D bounding boxes, and they struggle to ensure consistency of the inserted object. These limitations hinder flexibility and scalability in real-world applications. In this paper, we propose FreeInsert, a novel framework that leverages foundation models including MLLMs, LGMs, and diffusion models to disentangle object generation from spatial placement. This enables unsupervised and flexible object insertion in 3D scenes without spatial priors. FreeInsert starts with an MLLM-based parser that extracts structured semantics, including object types, spatial relationships, and attachment regions, from user instructions. These semantics guide both the reconstruction of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Computer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction
MethodsAttentive Walk-Aggregating Graph Neural Network · Diffusion
