3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon, Wetzstein, Bolei Zhou, Ceyuan Yang

TL;DR
3DitScene introduces a unified 3D scene editing framework that uses language-guided disentangled Gaussian Splatting, enabling precise, flexible control over scene composition and individual objects at multiple levels of granularity.
Contribution
It presents a novel framework combining 3D Gaussian representations with language semantics for versatile scene editing, bridging 2D and 3D manipulation.
Findings
Effective scene editing demonstrated through experiments
Versatile control over scene and object manipulation
Seamless integration of language semantics into 3D scene editing
Abstract
Scene image editing is crucial for entertainment, photography, and advertising design. Existing methods solely focus on either 2D individual object or 3D global scene editing. This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity. In this work, we propose 3DitScene, a novel and unified scene editing framework leveraging language-guided disentangled Gaussian Splatting that enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects. We first incorporate 3D Gaussians that are refined through generative priors and optimization techniques. Language features from CLIP then introduce semantics into 3D geometry for object disentanglement. With the disentangled Gaussians, 3DitScene allows for manipulation at both the global and individual levels,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsFocus · Contrastive Language-Image Pre-training
