TIMI: Training-Free Image-to-3D Multi-Instance Generation with Spatial Fidelity
Xiao Cai, Lianli Gao, Pengpeng Zeng, Ji Zhang, Heng Tao Shen, Jingkuan Song

TL;DR
TIMI is a training-free framework that enhances image-to-3D multi-instance generation by leveraging pre-trained models' spatial priors, ensuring high spatial fidelity without additional training.
Contribution
The paper introduces TIMI, a novel training-free approach with modules ISG and SGU that improve spatial fidelity and instance disentanglement in image-to-3D generation.
Findings
Outperforms existing methods in global layout accuracy
Achieves better local instance separation
Operates without additional training, enabling faster inference
Abstract
Precise spatial fidelity in Image-to-3D multi-instance generation is critical for downstream real-world applications. Recent work attempts to address this by fine-tuning pre-trained Image-to-3D (I23D) models on multi-instance datasets, which incurs substantial training overhead and struggles to guarantee spatial fidelity. In fact, we observe that pre-trained I23D models already possess meaningful spatial priors, which remain underutilized as evidenced by instance entanglement issues. Motivated by this, we propose TIMI, a novel Training-free framework for Image-to-3D Multi-Instance generation that achieves high spatial fidelity. Specifically, we first introduce an Instance-aware Separation Guidance (ISG) module, which facilitates instance disentanglement during the early denoising stage. Next, to stabilize the guidance introduced by ISG, we devise a Spatial-stabilized Geometry-adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Cell Image Analysis Techniques
