FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang

TL;DR
FlowZero introduces a novel zero-shot text-to-video synthesis framework that leverages large language models to generate dynamic scene syntax, guiding diffusion models for coherent, vivid motion videos from text descriptions.
Contribution
The paper presents a new method combining LLMs and diffusion models to generate temporally-coherent videos guided by dynamic scene syntax, with iterative refinement for better alignment.
Findings
Improved zero-shot video coherence and motion quality.
Effective use of scene syntax to guide diffusion models.
Enhanced global coherence through motion-enriched noise initialization.
Abstract
Text-to-video (T2V) generation is a rapidly growing research area that aims to translate the scenes, objects, and actions within complex video text into a sequence of coherent visual frames. We present FlowZero, a novel framework that combines Large Language Models (LLMs) with image diffusion models to generate temporally-coherent videos. FlowZero uses LLMs to understand complex spatio-temporal dynamics from text, where LLMs can generate a comprehensive dynamic scene syntax (DSS) containing scene descriptions, object layouts, and background motion patterns. These elements in DSS are then used to guide the image diffusion model for video generation with smooth object motions and frame-to-frame coherence. Moreover, FlowZero incorporates an iterative self-refinement process, enhancing the alignment between the spatio-temporal layouts and the textual prompts for the videos. To enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
MethodsDiffusion
