Feedforward 3D Editing via Text-Steerable Image-to-3D
Ziqi Ma, Hongqiao Chen, Yisong Yue, Georgia Gkioxari

TL;DR
Steer3D introduces a fast, feedforward method to enable text-based editing of 3D assets generated from images, enhancing control and fidelity in 3D model customization.
Contribution
The paper presents a novel approach that adapts ControlNet for image-to-3D generation, allowing text steerability with improved fidelity and speed, using a scalable data engine and a two-stage training process.
Findings
Outperforms competing methods in fidelity to instructions
Maintains better consistency with original 3D assets
Achieves 2.4x to 28.5x faster editing speed
Abstract
Recent progress in image-to-3D has opened up immense possibilities for design, AR/VR, and robotics. However, to use AI-generated 3D assets in real applications, a critical requirement is the capability to edit them easily. We present a feedforward method, Steer3D, to add text steerability to image-to-3D models, which enables editing of generated 3D assets with language. Our approach is inspired by ControlNet, which we adapt to image-to-3D generation to enable text steering directly in a forward pass. We build a scalable data engine for automatic data generation, and develop a two-stage training recipe based on flow-matching training and Direct Preference Optimization (DPO). Compared to competing methods, Steer3D more faithfully follows the language instruction and maintains better consistency with the original 3D asset, while being 2.4x to 28.5x faster. Steer3D demonstrates that it is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
