Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data
Tao Yang, Yangming Shi, Yunwen Huang, Feng Chen, Yin Zheng, Lei Zhang

TL;DR
Factorized-Dreamer demonstrates that high-quality video generation can be achieved using limited, low-quality data by factorizing the process and employing specialized modules, reducing the need for large-scale high-quality datasets.
Contribution
The paper introduces a novel factorized spatiotemporal framework for text-to-video generation that effectively trains on limited low-quality data without recaptioning or finetuning.
Findings
Effective high-quality video generation from limited LQ datasets.
Reduces dependence on large-scale HQ video-text pairs.
Achieves competitive results in T2V and image-to-video tasks.
Abstract
Text-to-video (T2V) generation has gained significant attention due to its wide applications to video generation, editing, enhancement and translation, \etc. However, high-quality (HQ) video synthesis is extremely challenging because of the diverse and complex motions existed in real world. Most existing works struggle to address this problem by collecting large-scale HQ videos, which are inaccessible to the community. In this work, we show that publicly available limited and low-quality (LQ) data are sufficient to train a HQ video generator without recaptioning or finetuning. We factorize the whole T2V generation process into two steps: generating an image conditioned on a highly descriptive caption, and synthesizing the video conditioned on the generated image and a concise caption of motion details. Specifically, we present \emph{Factorized-Dreamer}, a factorized spatiotemporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Video Analysis and Summarization · Advanced Data Compression Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Gated Linear Unit · Byte Pair Encoding · Inverse Square Root Schedule · Softmax · Linear Layer · Attention Dropout · SentencePiece · Dense Connections · Dropout
