TL;DR
This paper introduces FVMD, a new metric based on motion features for evaluating temporal and motion consistency in generated videos, aligning well with human perception.
Contribution
We propose FVMD, a novel motion-based metric for assessing video quality, addressing the gap in evaluating temporal and motion consistency in video generation.
Findings
FVMD effectively detects temporal noise in videos.
FVMD aligns better with human perception than existing metrics.
Motion features improve video quality assessment models.
Abstract
Significant advancements have been made in video generative models recently. Unlike image generation, video generation presents greater challenges, requiring not only generating high-quality frames but also ensuring temporal consistency across these frames. Despite the impressive progress, research on metrics for evaluating the quality of generated videos, especially concerning temporal and motion consistency, remains underexplored. To bridge this research gap, we propose Fr\'echet Video Motion Distance (FVMD) metric, which focuses on evaluating motion consistency in video generation. Specifically, we design explicit motion features based on key point tracking, and then measure the similarity between these features via the Fr\'echet distance. We conduct sensitivity analysis by injecting noise into real videos to verify the effectiveness of FVMD. Further, we carry out a large-scale human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
