DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu, Liu, Yingya Zhang, Jingren Zhou, Hongming Shan

TL;DR
DreamVideo introduces a two-stage diffusion-based approach for personalized video generation, effectively capturing subject appearance and motion from limited static images and videos, outperforming existing methods.
Contribution
It proposes a novel decoupled framework with subject and motion adapters, enabling flexible and high-quality customized video synthesis from minimal input data.
Findings
Outperforms state-of-the-art methods in personalized video generation
Effectively captures fine subject details from limited images
Accurately models target motion patterns from few videos
Abstract
Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization
MethodsDiffusion · Adapter
