ShapeUP: Scalable Image-Conditioned 3D Editing
Inbar Gat, Dana Cohen-Bar, Guy Levy, Elad Richardson, Daniel Cohen-Or

TL;DR
ShapeUP introduces a scalable 3D editing framework that uses a supervised latent translation approach with a 3D Diffusion Transformer, enabling precise, controllable, and consistent 3D asset modifications.
Contribution
It presents a novel image-conditioned 3D editing method that leverages a pretrained 3D foundation model and supervised training for improved control and scalability.
Findings
Outperforms existing methods in identity preservation and edit fidelity.
Enables fine-grained, mask-free local and global 3D edits.
Maintains structural consistency with original assets.
Abstract
Recent advancements in 3D foundation models have enabled the generation of high-fidelity assets, yet precise 3D manipulation remains a significant challenge. Existing 3D editing frameworks often face a difficult trade-off between visual controllability, geometric consistency, and scalability. Specifically, optimization-based methods are prohibitively slow, multi-view 2D propagation techniques suffer from visual drift, and training-free latent manipulation methods are inherently bound by frozen priors and cannot directly benefit from scaling. In this work, we present ShapeUP, a scalable, image-conditioned 3D editing framework that formulates editing as a supervised latent-to-latent translation within a native 3D representation. This formulation allows ShapeUP to build on a pretrained 3D foundation model, leveraging its strong generative prior while adapting it to editing through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
