Adapting Frechet Audio Distance for Generative Music Evaluation
Azalea Gui, Hannes Gamper, Sebastian Braun, Dimitra Emmanouilidou

TL;DR
This paper evaluates and improves the Frechet Audio Distance (FAD) metric for assessing generative music quality, addressing biases and proposing methods for better correlation with perceptual quality, along with a practical toolkit.
Contribution
It identifies limitations of FAD, proposes score extrapolation to reduce sample bias, and introduces an adapted FAD toolkit for more accurate music evaluation.
Findings
FAD scores correlate better with perceptual quality when using optimal embeddings and reference sets.
Extrapolating FAD scores reduces sample size bias effectively.
Per-song FAD can detect outliers and predict perceptual quality across models.
Abstract
The growing popularity of generative music models underlines the need for perceptually relevant, objective music quality metrics. The Frechet Audio Distance (FAD) is commonly used for this purpose even though its correlation with perceptual quality is understudied. We show that FAD performance may be hampered by sample size bias, poor choice of audio embeddings, or the use of biased or low-quality reference sets. We propose reducing sample size bias by extrapolating scores towards an infinite sample size. Through comparisons with MusicCaps labels and a listening test we identify audio embeddings and music reference sets that yield FAD scores well-correlated with acoustic and musical quality. Our results suggest that per-song FAD can be useful to identify outlier samples and predict perceptual quality for a range of music sets and generative models. Finally, we release a toolkit that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Final intern talk: Improving Frechet Audio Distance for Generative Music Evaluation· youtube
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
