3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian   Splatting

Qihang Zhang; Yinghao Xu; Chaoyang Wang; Hsin-Ying Lee; Gordon; Wetzstein; Bolei Zhou; Ceyuan Yang

arXiv:2405.18424·cs.CV·May 29, 2024·1 cites

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon, Wetzstein, Bolei Zhou, Ceyuan Yang

PDF

Open Access

TL;DR

3DitScene introduces a unified 3D scene editing framework that uses language-guided disentangled Gaussian Splatting, enabling precise, flexible control over scene composition and individual objects at multiple levels of granularity.

Contribution

It presents a novel framework combining 3D Gaussian representations with language semantics for versatile scene editing, bridging 2D and 3D manipulation.

Findings

01

Effective scene editing demonstrated through experiments

02

Versatile control over scene and object manipulation

03

Seamless integration of language semantics into 3D scene editing

Abstract

Scene image editing is crucial for entertainment, photography, and advertising design. Existing methods solely focus on either 2D individual object or 3D global scene editing. This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity. In this work, we propose 3DitScene, a novel and unified scene editing framework leveraging language-guided disentangled Gaussian Splatting that enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects. We first incorporate 3D Gaussians that are refined through generative priors and optimization techniques. Language features from CLIP then introduce semantics into 3D geometry for object disentanglement. With the disentangled Gaussians, 3DitScene allows for manipulation at both the global and individual levels,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsFocus · Contrastive Language-Image Pre-training