3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting

Ziyang Yan; Yihua Shao; Minwen Liao; Siyu Chen; Nan Wang; Muyuan Lin; Jenq-Neng Hwang; Hao Zhao; Fabio Remondino; Lei Li

arXiv:2412.01583·cs.CV·March 24, 2026

3DSceneEditor: Controllable 3D Scene Editing with Gaussian Splatting

Ziyang Yan, Yihua Shao, Minwen Liao, Siyu Chen, Nan Wang, Muyuan Lin, Jenq-Neng Hwang, Hao Zhao, Fabio Remondino, Lei Li

PDF

Open Access

TL;DR

3DSceneEditor introduces a novel 3D pipeline using Gaussian Splatting for precise, interactive editing of complex 3D scenes, significantly improving control and efficiency over existing multi-step, diffusion-based methods.

Contribution

It presents a fully 3D-based editing framework that enables direct Gaussian manipulation for high-quality scene edits, integrating semantic labeling and zero-shot grounding for improved control.

Findings

01

Outperforms existing methods in editing precision

02

Achieves higher efficiency and interactive performance

03

Establishes new benchmarks for 3D scene editing

Abstract

The creation of 3D scenes has traditionally been both labor-intensive and costly, requiring designers to meticulously configure 3D assets and environments. Recent advancements in generative AI, including text-to-3D and image-to-3D methods, have dramatically reduced the complexity and cost of this process. However, current techniques for editing complex 3D scenes continue to rely on generally interactive multi-step, 2D-to-3D projection methods and diffusion-based techniques, which often lack precision in control and hamper interactive-rate performance. In this work, we propose ***3DSceneEditor***, a fully 3D-based paradigm for interactive-rate, precise editing of intricate 3D scenes using Gaussian Splatting. Unlike conventional methods, 3DSceneEditor operates through a streamlined 3D pipeline, enabling direct Gaussian-based manipulation for efficient, high-quality edits based on input…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · ALIGN · Contrastive Language-Image Pre-training