TL;DR
This paper reveals biases in FID and Inception Score metrics for generative models, proposes bias correction methods via extrapolation, and demonstrates improved estimates and training techniques using Quasi-Monte Carlo methods.
Contribution
It introduces a method to extrapolate FID and IS to unbiased estimates with infinite samples, addressing biases dependent on the model and sample size.
Findings
Bias in FID and IS depends on the model and sample size.
Extrapolation methods effectively reduce bias in these metrics.
Using Quasi-Monte Carlo improves finite-sample estimates of FID and IS.
Abstract
This paper shows that two commonly used evaluation metrics for generative models, the Fr\'echet Inception Distance (FID) and the Inception Score (IS), are biased -- the expected value of the score computed for a finite sample set is not the true value of the score. Worse, the paper shows that the bias term depends on the particular model being evaluated, so model A may get a better score than model B simply because model A's bias term is smaller. This effect cannot be fixed by evaluating at a fixed number of samples. This means all comparisons using FID or IS as currently computed are unreliable. We then show how to extrapolate the score to obtain an effectively bias-free estimate of scores computed with an infinite number of samples, which we term and . In turn, this effectively bias-free estimate requires good estimates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Effectively Unbiased FID and Inception Score and Where to Find Them· youtube
Taxonomy
MethodsConvolution · Dogecoin Customer Service Number +1-833-534-1729
