Controllable 3D Object Generation with Single Image Prompt
Jaeseok Lee, Jaekoo Lee

TL;DR
This paper introduces two methods for controllable 3D object generation from a single image prompt, eliminating the need for textual inversion and improving 3D consistency, with validated user study results.
Contribution
The paper proposes an off-the-shelf image adapter and a depth conditioned warmup strategy for improved, controllable 3D object generation without textual inversion.
Findings
Comparable performance to text-inversion methods
Enhanced 3D consistency in generated objects
User study confirms improved control and quality
Abstract
Recently, the impressive generative capabilities of diffusion models have been demonstrated, producing images with remarkable fidelity. Particularly, existing methods for the 3D object generation tasks, which is one of the fastest-growing segments in computer vision, pre-dominantly use text-to-image diffusion models with textual inversion which train a pseudo text prompt to describe the given image. In practice, various text-to-image generative models employ textual inversion to learn concepts or styles of target object in the pseudo text prompt embedding space, thereby generating sophisticated outputs. However, textual inversion requires additional training time and lacks control ability. To tackle this issues, we propose two innovative methods: (1) using an off-the-shelf image adapter that generates 3D objects without textual inversion, offering enhanced control over conditions such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Face recognition and analysis
