SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

Minghao Chen; Junyu Xie; Iro Laina; Andrea Vedaldi

arXiv:2312.09246·cs.CV·December 15, 2023·1 cites

SHAP-EDITOR: Instruction-guided Latent 3D Editing in Seconds

Minghao Chen, Junyu Xie, Iro Laina, Andrea Vedaldi

PDF

Open Access 1 Datasets

TL;DR

Shap-Editor introduces a fast, feed-forward 3D editing framework that enables real-time modifications by directly manipulating a latent space, eliminating the need for time-consuming optimization processes.

Contribution

The paper presents a novel 3D editing method that operates in seconds by leveraging a latent space and a feed-forward network, bypassing traditional optimization-based approaches.

Findings

01

Operates in approximately one second per edit

02

Generalizes well to diverse 3D assets and prompts

03

Achieves comparable quality to optimization-based methods

Abstract

We propose a novel feed-forward 3D editing framework called Shap-Editor. Prior research on editing 3D objects primarily concentrated on editing individual objects by leveraging off-the-shelf 2D image editing networks. This is achieved via a process called distillation, which transfers knowledge from the 2D network to 3D assets. Distillation necessitates at least tens of minutes per asset to attain satisfactory editing results, and is thus not very practical. In contrast, we ask whether 3D editing can be carried out directly by a feed-forward network, eschewing test-time optimisation. In particular, we hypothesise that editing can be greatly simplified by first encoding 3D objects in a suitable latent space. We validate this hypothesis by building upon the latent space of Shap-E. We demonstrate that direct 3D editing in this space is possible and efficient by building a feed-forward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

taesiri/arxiv_qa
dataset· 193 dl
193 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Cell Image Analysis Techniques