FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation
Ariel Shaulov, Itay Hazan, Lior Wolf, Hila Chefer

TL;DR
FlowMo is a training-free method that improves motion coherence in pre-trained text-to-video diffusion models by dynamically reducing temporal variance, enhancing temporal consistency without retraining or additional inputs.
Contribution
FlowMo introduces a novel, training-free guidance technique that leverages the model's own predictions to enhance motion coherence in video generation.
Findings
Significantly improves motion coherence in various models
Maintains visual quality and prompt alignment
Operates without retraining or auxiliary inputs
Abstract
Text-to-video diffusion models are notoriously limited in their ability to model temporal aspects such as motion, physics, and dynamic interactions. Existing approaches address this limitation by retraining the model or introducing external conditioning signals to enforce temporal consistency. In this work, we explore whether a meaningful temporal representation can be extracted directly from the predictions of a pre-trained model without any additional training or auxiliary inputs. We introduce FlowMo, a novel training-free guidance method that enhances motion coherence using only the model's own predictions in each diffusion step. FlowMo first derives an appearance-debiased temporal representation by measuring the distance between latents corresponding to consecutive frames. This highlights the implicit temporal structure predicted by the model. It then estimates motion coherence by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Computer Graphics and Visualization Techniques
