AI Powered High Quality Text to Video Generation with Enhanced Temporal Consistency
Piyushkumar Patel

TL;DR
This paper introduces MOVAI, a hierarchical framework for text-to-video generation that significantly improves temporal consistency, visual quality, and semantic control by integrating scene understanding and diffusion models.
Contribution
The paper presents a novel hierarchical framework with a scene parser, attention mechanism, and refinement module, advancing the state-of-the-art in text-to-video synthesis.
Findings
Achieves 15.3% improvement in LPIPS quality metric.
Improves FVD by 12.7%, indicating better temporal coherence.
Outperforms existing methods in user preference studies by 18.9%.
Abstract
Text to video generation has emerged as a critical frontier in generative artificial intelligence, yet existing approaches struggle with maintaining temporal consistency, compositional understanding, and fine grained control over visual narratives. We present MOVAI (Multimodal Original Video AI), a novel hierarchical framework that integrates compositional scene understanding with temporal aware diffusion models for high fidelity text to video synthesis. Our approach introduces three key innovations: (1) a Compositional Scene Parser (CSP) that decomposes textual descriptions into hierarchical scene graphs with temporal annotations, (2) a Temporal-Spatial Attention Mechanism (TSAM) that ensures coherent motion dynamics across frames while preserving spatial details, and (3) a Progressive Video Refinement (PVR) module that iteratively enhances video quality through multi-scale temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Human Motion and Animation
