Factorized-Dreamer: Training A High-Quality Video Generator with Limited   and Low-Quality Data

Tao Yang; Yangming Shi; Yunwen Huang; Feng Chen; Yin Zheng; Lei Zhang

arXiv:2408.10119·cs.CV·August 20, 2024

Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

Tao Yang, Yangming Shi, Yunwen Huang, Feng Chen, Yin Zheng, Lei Zhang

PDF

Open Access

TL;DR

Factorized-Dreamer demonstrates that high-quality video generation can be achieved using limited, low-quality data by factorizing the process and employing specialized modules, reducing the need for large-scale high-quality datasets.

Contribution

The paper introduces a novel factorized spatiotemporal framework for text-to-video generation that effectively trains on limited low-quality data without recaptioning or finetuning.

Findings

01

Effective high-quality video generation from limited LQ datasets.

02

Reduces dependence on large-scale HQ video-text pairs.

03

Achieves competitive results in T2V and image-to-video tasks.

Abstract

Text-to-video (T2V) generation has gained significant attention due to its wide applications to video generation, editing, enhancement and translation, \etc. However, high-quality (HQ) video synthesis is extremely challenging because of the diverse and complex motions existed in real world. Most existing works struggle to address this problem by collecting large-scale HQ videos, which are inaccessible to the community. In this work, we show that publicly available limited and low-quality (LQ) data are sufficient to train a HQ video generator without recaptioning or finetuning. We factorize the whole T2V generation process into two steps: generating an image conditioned on a highly descriptive caption, and synthesizing the video conditioned on the generated image and a concise caption of motion details. Specifically, we present \emph{Factorized-Dreamer}, a factorized spatiotemporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Coding and Compression Technologies · Video Analysis and Summarization · Advanced Data Compression Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Gated Linear Unit · Byte Pair Encoding · Inverse Square Root Schedule · Softmax · Linear Layer · Attention Dropout · SentencePiece · Dense Connections · Dropout