3DreamBooth: High-Fidelity 3D Subject-Driven Video Generation Model
Hyun-kyu Ko, Jihyeon Park, Younghyun Kim, Dongheok Park, and Eunbyung Park

TL;DR
This paper introduces 3DreamBooth and 3Dapter, a novel framework for high-fidelity, 3D-aware video generation of customized subjects that overcomes limitations of 2D-centric methods by decoupling spatial geometry from temporal motion.
Contribution
It proposes a 1-frame optimization approach to embed 3D priors into models and introduces 3Dapter for multi-view fine-grained texture enhancement and faster convergence.
Findings
Enables 3D-aware video customization without extensive multi-view datasets.
Improves spatial consistency and texture detail in generated videos.
Reduces training time through efficient multi-view optimization.
Abstract
Creating dynamic, view-consistent videos of customized subjects is highly sought after for a wide range of emerging applications, including immersive VR/AR, virtual production, and next-generation e-commerce. However, despite rapid progress in subject-driven video generation, existing methods predominantly treat subjects as 2D entities, focusing on transferring identity through single-view visual features or textual prompts. Because real-world subjects are inherently 3D, applying these 2D-centric approaches to 3D object customization reveals a fundamental limitation: they lack the comprehensive spatial priors necessary to reconstruct the 3D geometry. Consequently, when synthesizing novel views, they must rely on generating plausible but arbitrary details for unseen regions, rather than preserving the true 3D identity. Achieving genuine 3D-aware customization remains challenging due to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation
