A Survey: Spatiotemporal Consistency in Video Generation
Zhiyu Yin, Kehai Chen, Xuefeng Bai, Ruili Jiang, Juntao Li, Hongdong Li, Jin Liu, Yang Xiang, Jun Yu, Min Zhang

TL;DR
This survey reviews recent advancements in video generation focusing on maintaining spatiotemporal consistency, discussing models, techniques, benchmarks, and future challenges to improve the quality and coherence of generated videos.
Contribution
It provides a comprehensive systematic review of methods, frameworks, and evaluation metrics for spatiotemporal consistency in video generation, highlighting current progress and future directions.
Findings
Analysis of various generation models and their effectiveness
Comparison of feature representations and training strategies
Identification of key challenges and promising research directions
Abstract
Video generation aims to produce temporally coherent sequences of visual frames, representing a pivotal advancement in Artificial Intelligence Generated Content (AIGC). Compared to static image generation, video generation poses unique challenges: it demands not only high-quality individual frames but also strong temporal coherence to ensure consistency throughout the spatiotemporal sequence. Although research addressing spatiotemporal consistency in video generation has increased in recent years, systematic reviews focusing on this core issue remain relatively scarce. To fill this gap, this paper views the video generation task as a sequential sampling process from a high-dimensional spatiotemporal distribution, and further discusses spatiotemporal consistency. We provide a systematic review of the latest advancements in the field. The content spans multiple dimensions including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · 3D Modeling in Geospatial Applications · Video Analysis and Summarization
MethodsFocus
