ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement
Zhefan Rao, Liya Ji, Yazhou Xing, Runtao Liu, Zhaoyang Liu, Jiaxin, Xie, Ziqiao Peng, Yingqing He, Qifeng Chen

TL;DR
ModelGrow introduces a continual pre-training approach for text-to-video models that expands their capacity and enhances semantic understanding by integrating large language models, leading to improved generation performance with limited resources.
Contribution
It is the first systematic exploration of continual pre-training for T2V models, combining model expansion techniques with language understanding enhancements.
Findings
ModelGrow improves T2V generation quality across multiple metrics.
The method enhances semantic alignment with complex prompts.
Model expansion and language integration significantly boost performance.
Abstract
Text-to-video (T2V) generation has gained significant attention recently. However, the costs of training a T2V model from scratch remain persistently high, and there is considerable room for improving the generation performance, especially under limited computation resources. This work explores the continual general pre-training of text-to-video models, enabling the model to "grow" its abilities based on a pre-trained foundation, analogous to how humans acquire new knowledge based on past experiences. There is a lack of extensive study of the continual pre-training techniques in T2V generation. In this work, we take the initial step toward exploring this task systematically and propose ModelGrow. Specifically, we break this task into two key aspects: increasing model capacity and improving semantic understanding. For model capacity, we introduce several novel techniques to expand the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need
