Feedforward 3D Editing via Text-Steerable Image-to-3D

Ziqi Ma; Hongqiao Chen; Yisong Yue; Georgia Gkioxari

arXiv:2512.13678·cs.CV·December 16, 2025

Feedforward 3D Editing via Text-Steerable Image-to-3D

Ziqi Ma, Hongqiao Chen, Yisong Yue, Georgia Gkioxari

PDF

Open Access 1 Models 2 Datasets

TL;DR

Steer3D introduces a fast, feedforward method to enable text-based editing of 3D assets generated from images, enhancing control and fidelity in 3D model customization.

Contribution

The paper presents a novel approach that adapts ControlNet for image-to-3D generation, allowing text steerability with improved fidelity and speed, using a scalable data engine and a two-stage training process.

Findings

01

Outperforms competing methods in fidelity to instructions

02

Maintains better consistency with original 3D assets

03

Achieves 2.4x to 28.5x faster editing speed

Abstract

Recent progress in image-to-3D has opened up immense possibilities for design, AR/VR, and robotics. However, to use AI-generated 3D assets in real applications, a critical requirement is the capability to edit them easily. We present a feedforward method, Steer3D, to add text steerability to image-to-3D models, which enables editing of generated 3D assets with language. Our approach is inspired by ControlNet, which we adapt to image-to-3D generation to enable text steering directly in a forward pass. We build a scalable data engine for automatic data generation, and develop a two-stage training recipe based on flow-matching training and Direct Preference Optimization (DPO). Compared to competing methods, Steer3D more faithfully follows the language instruction and maintains better consistency with the original 3D asset, while being 2.4x to 28.5x faster. Steer3D demonstrates that it is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ziqima/Steer3D
model· ♡ 2
♡ 2

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation