Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss
Xinyu Zhang, Zicheng Duan, Dong Gong, Lingqiao Liu

TL;DR
This paper introduces a training-free method for generating temporally consistent videos guided by motion, utilizing a novel motion consistency loss to improve coherence without additional training or model modifications.
Contribution
It proposes a simple, effective motion consistency loss that captures inter-frame feature correlations to enhance temporal coherence in training-free video generation.
Findings
Improves temporal consistency across various motion tasks.
Maintains high-quality motion guidance without extra training.
Sets new standards for efficient, coherent video generation.
Abstract
In this paper, we address the challenge of generating temporally consistent videos with motion guidance. While many existing methods depend on additional control modules or inference-time fine-tuning, recent studies suggest that effective motion guidance is achievable without altering the model architecture or requiring extra training. Such approaches offer promising compatibility with various video generation foundation models. However, existing training-free methods often struggle to maintain consistent temporal coherence across frames or to follow guided motion accurately. In this work, we propose a simple yet effective solution that combines an initial-noise-based approach with a novel motion consistency loss, the latter being our key innovation. Specifically, we capture the inter-frame feature correlation patterns of intermediate features from a video diffusion model to represent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
