Controllable 3D Object Generation with Single Image Prompt

Jaeseok Lee; Jaekoo Lee

arXiv:2511.22194·cs.CV·December 1, 2025

Controllable 3D Object Generation with Single Image Prompt

Jaeseok Lee, Jaekoo Lee

PDF

Open Access

TL;DR

This paper introduces two methods for controllable 3D object generation from a single image prompt, eliminating the need for textual inversion and improving 3D consistency, with validated user study results.

Contribution

The paper proposes an off-the-shelf image adapter and a depth conditioned warmup strategy for improved, controllable 3D object generation without textual inversion.

Findings

01

Comparable performance to text-inversion methods

02

Enhanced 3D consistency in generated objects

03

User study confirms improved control and quality

Abstract

Recently, the impressive generative capabilities of diffusion models have been demonstrated, producing images with remarkable fidelity. Particularly, existing methods for the 3D object generation tasks, which is one of the fastest-growing segments in computer vision, pre-dominantly use text-to-image diffusion models with textual inversion which train a pseudo text prompt to describe the given image. In practice, various text-to-image generative models employ textual inversion to learn concepts or styles of target object in the pseudo text prompt embedding space, thereby generating sophisticated outputs. However, textual inversion requires additional training time and lacks control ability. To tackle this issues, we propose two innovative methods: (1) using an off-the-shelf image adapter that generates 3D objects without textual inversion, offering enhanced control over conditions such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis