Vox-E: Text-guided Voxel Editing of 3D Objects
Etai Sella, Gal Fiebelman, Peter Hedman, Hadar Averbuch-Elor

TL;DR
Vox-E introduces a novel method for editing 3D objects guided by text prompts, combining diffusion models with a new 3D regularization loss to improve control and fidelity in 3D object editing.
Contribution
The paper proposes a new volumetric regularization loss and a cross-attention optimization technique for more precise and diverse text-guided 3D object editing.
Findings
Effective editing of 3D objects guided by text prompts
Outperforms prior methods in fidelity and diversity of edits
Enables complex and constrained 3D object modifications
Abstract
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images that convey complex visual concepts. This generative power has more recently been leveraged to perform text-to-3D synthesis. In this work, we present a technique that harnesses the power of latent diffusion models for editing existing 3D objects. Our method takes oriented 2D images of a 3D object as input and learns a grid-based volumetric representation of it. To guide the volumetric representation to conform to a target text prompt, we follow unconditional text-to-3D methods and optimize a Score Distillation Sampling (SDS) loss. However, we observe that combining this diffusion-guided loss with an image-based regularization loss that encourages the representation not to deviate too strongly from the input object is challenging, as it requires achieving two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · 3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction
MethodsDiffusion
