Vinedresser3D: Agentic Text-guided 3D Editing
Yankuan Chi, Xiang Li, Zixuan Huang, James M. Rehg

TL;DR
Vinedresser3D is a novel framework that enables high-quality, precise, and coherent text-guided editing of 3D assets by operating directly in the latent space of a 3D generative model, integrating multimodal understanding and visual guidance.
Contribution
It introduces an agentic, multi-step approach combining language understanding, view selection, visual guidance, and latent space inpainting for 3D editing, outperforming prior methods.
Findings
Outperforms prior baselines in automatic metrics
Achieves higher human preference scores
Enables mask-free, precise 3D editing
Abstract
Text-guided 3D editing aims to modify existing 3D assets using natural-language instructions. Current methods struggle to jointly understand complex prompts, automatically localize edits in 3D, and preserve unedited content. We introduce Vinedresser3D, an agentic framework for high-quality text-guided 3D editing that operates directly in the latent space of a native 3D generative model. Given a 3D asset and an editing prompt, Vinedresser3D uses a multimodal large language model to infer rich descriptions of the original asset, identify the edit region and edit type (addition, modification, deletion), and generate decomposed structural and appearance-level text guidance. The agent then selects an informative view and applies an image editing model to obtain visual guidance. Finally, an inversion-based rectified-flow inpainting pipeline with an interleaved sampling module performs editing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Additive Manufacturing and 3D Printing Technologies · Innovations in Concrete and Construction Materials
