SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi

TL;DR
Shap-Editor introduces a fast, feed-forward 3D editing framework that enables real-time modifications by directly manipulating a latent space, eliminating the need for time-consuming optimization processes.
Contribution
The paper presents a novel 3D editing method that operates in seconds by leveraging a latent space and a feed-forward network, bypassing traditional optimization-based approaches.
Findings
Operates in approximately one second per edit
Generalizes well to diverse 3D assets and prompts
Achieves comparable quality to optimization-based methods
Abstract
We propose a novel feed-forward 3D editing framework called Shap-Editor. Prior research on editing 3D objects primarily concentrated on editing individual objects by leveraging off-the-shelf 2D image editing networks. This is achieved via a process called distillation, which transfers knowledge from the 2D network to 3D assets. Distillation necessitates at least tens of minutes per asset to attain satisfactory editing results, and is thus not very practical. In contrast, we ask whether 3D editing can be carried out directly by a feed-forward network, eschewing test-time optimisation. In particular, we hypothesise that editing can be greatly simplified by first encoding 3D objects in a suitable latent space. We validate this hypothesis by building upon the latent space of Shap-E. We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Cell Image Analysis Techniques
