Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions
Etai Sella, Hao Phung, Nitay Amiel, Or Litany, Or Patashnik, Hadar Averbuch-Elor

TL;DR
Prox-E introduces a training-free, primitive-based 3D shape editing framework that enables precise, localized modifications while maintaining overall shape identity, leveraging pretrained vision-language models for structural control.
Contribution
Prox-E is the first to combine primitive-based shape abstraction with pretrained vision-language models for fine-grained, training-free 3D shape editing.
Findings
Balances shape identity preservation and editing fidelity effectively.
Outperforms existing 2D-based and training-based 3D editing methods.
Enables localized structural modifications without retraining.
Abstract
Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
