MovieFactory: Automatic Movie Creation from Text using Large Generative   Models for Language and Images

Junchen Zhu; Huan Yang; Huiguo He; Wenjing Wang; Zixi Tuo; Wen-Huang; Cheng; Lianli Gao; Jingkuan Song; Jianlong Fu

arXiv:2306.07257·cs.CV·June 13, 2023·1 cites

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang, Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu

PDF

Open Access

TL;DR

MovieFactory is an innovative framework that automatically generates multi-scene, cinematic movies with synchronized sound from natural language inputs, utilizing large generative models for both images and audio.

Contribution

It introduces the first fully automated movie generation system that creates high-quality, multi-modality movies from simple text, surpassing previous soundless and single-scene methods.

Findings

01

Produces realistic, diverse, multi-scene movies with synchronized audio

02

Uses a two-stage process for video generation involving spatial finetuning and temporal learning

03

Demonstrates high-quality results with immersive visual and auditory experiences

Abstract

In this paper, we present MovieFactory, a powerful framework to generate cinematic-picture (3072 $\times$ 1280), film-style (multi-scene), and multi-modality (sounding) movies on the demand of natural languages. As the first fully automated movie generation model to the best of our knowledge, our approach empowers users to create captivating movies with smooth transitions using simple text inputs, surpassing existing methods that produce soundless videos limited to a single scene of modest quality. To facilitate this distinctive functionality, we leverage ChatGPT to expand user-provided text into detailed sequential scripts for movie generation. Then we bring scripts to life visually and acoustically through vision generation and audio retrieval. To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process. Firstly, we employ…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Music and Audio Processing

MethodsALIGN · Diffusion