DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen, Iro Laina, Andrea Vedaldi

TL;DR
DGE introduces a two-stage method for efficient, multi-view consistent 3D editing guided by language, improving accuracy and speed over traditional iterative approaches by leveraging 3D Gaussian Splatting.
Contribution
The paper presents a training-free approach to make 2D image editors multi-view consistent and directly optimizes 3D representations, enabling fast, accurate 3D scene editing from language instructions.
Findings
Significantly faster than existing methods
Achieves multi-view consistency without training
Allows selective editing of scene parts
Abstract
We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data. However, this process is often inefficient due to the need for iterative updates of costly 3D representations, such as neural radiance fields, either through individual view edits or score distillation sampling. A major disadvantage of this approach is the slow convergence caused by aggregating inconsistent information across views, as the guidance from 2D models is not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method that addresses these issues in two stages. First, we modify a given high-quality image editor like InstructPix2Pix to be multi-view consistent. To do so, we propose a training-free approach that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Computer Graphics and Visualization Techniques
