VideoStudio: Generating Consistent-Content and Multi-Scene Videos

Fuchen Long; Zhaofan Qiu; Ting Yao; Tao Mei

arXiv:2401.01256·cs.CV·September 17, 2024·6 cites

VideoStudio: Generating Consistent-Content and Multi-Scene Videos

Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

PDF

Open Access 1 Repo

TL;DR

VideoStudio introduces a novel framework that leverages large language models and diffusion techniques to generate multi-scene videos with consistent content and logical scene transitions, surpassing existing methods in quality and coherence.

Contribution

The paper presents a new approach combining LLMs and diffusion models for multi-scene video generation with content consistency and scene logic management.

Findings

01

Outperforms state-of-the-art models in visual quality

02

Achieves higher content consistency across scenes

03

Receives better user preference scores

Abstract

The recent innovations and breakthroughs in diffusion models have significantly expanded the possibilities of generating high-quality videos for the given prompts. Most existing works tackle the single-scene scenario with only one video event occurring in a single background. Extending to generate multi-scene videos nevertheless is not trivial and necessitates to nicely manage the logic in between while preserving the consistent visual appearance of key content across video scenes. In this paper, we propose a novel framework, namely VideoStudio, for consistent-content and multi-scene video generation. Technically, VideoStudio leverages Large Language Models (LLM) to convert the input prompt into comprehensive multi-scene script that benefits from the logical knowledge learnt by LLM. The script for each scene includes a prompt describing the event, the foreground/background entities, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fuchenustc/videostudio
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion