TIMI: Training-Free Image-to-3D Multi-Instance Generation with Spatial Fidelity

Xiao Cai; Lianli Gao; Pengpeng Zeng; Ji Zhang; Heng Tao Shen; Jingkuan Song

arXiv:2603.01371·cs.CV·March 3, 2026

TIMI: Training-Free Image-to-3D Multi-Instance Generation with Spatial Fidelity

Xiao Cai, Lianli Gao, Pengpeng Zeng, Ji Zhang, Heng Tao Shen, Jingkuan Song

PDF

Open Access

TL;DR

TIMI is a training-free framework that enhances image-to-3D multi-instance generation by leveraging pre-trained models' spatial priors, ensuring high spatial fidelity without additional training.

Contribution

The paper introduces TIMI, a novel training-free approach with modules ISG and SGU that improve spatial fidelity and instance disentanglement in image-to-3D generation.

Findings

01

Outperforms existing methods in global layout accuracy

02

Achieves better local instance separation

03

Operates without additional training, enabling faster inference

Abstract

Precise spatial fidelity in Image-to-3D multi-instance generation is critical for downstream real-world applications. Recent work attempts to address this by fine-tuning pre-trained Image-to-3D (I23D) models on multi-instance datasets, which incurs substantial training overhead and struggles to guarantee spatial fidelity. In fact, we observe that pre-trained I23D models already possess meaningful spatial priors, which remain underutilized as evidenced by instance entanglement issues. Motivated by this, we propose TIMI, a novel Training-free framework for Image-to-3D Multi-Instance generation that achieves high spatial fidelity. Specifically, we first introduce an Instance-aware Separation Guidance (ISG) module, which facilitates instance disentanglement during the early denoising stage. Next, to stabilize the guidance introduced by ISG, we devise a Spatial-stabilized Geometry-adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Vision and Imaging · Cell Image Analysis Techniques