Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
Ge Ya Luo, Gian Mario Favero, Zhi Hao Luo, Alexia Jolicoeur-Martineau,, Christopher Pal

TL;DR
This paper critically examines the limitations of the FVD metric for video quality evaluation and introduces JEDi, a new embedding distance that outperforms FVD in reliability, sample efficiency, and correlation with human judgment.
Contribution
The paper identifies key shortcomings of FVD and proposes JEDi, a novel metric based on Joint Embedding Predictive Architecture, demonstrating superior performance in multiple datasets.
Findings
JEDi requires only 16% of samples compared to FVD.
JEDi increases alignment with human evaluation by 34%.
FVD's limitations include non-Gaussian features and insensitivity to temporal distortions.
Abstract
The Fr\'echet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectiveness relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Advanced Image Processing Techniques
