Adapting Frechet Audio Distance for Generative Music Evaluation

Azalea Gui; Hannes Gamper; Sebastian Braun; Dimitra Emmanouilidou

arXiv:2311.01616·eess.AS·March 7, 2024·ICASSP·2 cites

Adapting Frechet Audio Distance for Generative Music Evaluation

Azalea Gui, Hannes Gamper, Sebastian Braun, Dimitra Emmanouilidou

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper evaluates and improves the Frechet Audio Distance (FAD) metric for assessing generative music quality, addressing biases and proposing methods for better correlation with perceptual quality, along with a practical toolkit.

Contribution

It identifies limitations of FAD, proposes score extrapolation to reduce sample bias, and introduces an adapted FAD toolkit for more accurate music evaluation.

Findings

01

FAD scores correlate better with perceptual quality when using optimal embeddings and reference sets.

02

Extrapolating FAD scores reduces sample size bias effectively.

03

Per-song FAD can detect outliers and predict perceptual quality across models.

Abstract

The growing popularity of generative music models underlines the need for perceptually relevant, objective music quality metrics. The Frechet Audio Distance (FAD) is commonly used for this purpose even though its correlation with perceptual quality is understudied. We show that FAD performance may be hampered by sample size bias, poor choice of audio embeddings, or the use of biased or low-quality reference sets. We propose reducing sample size bias by extrapolating scores towards an infinite sample size. Through comparisons with MusicCaps labels and a listening test we identify audio embeddings and music reference sets that yield FAD scores well-correlated with acoustic and musical quality. Our results suggest that per-song FAD can be useful to identify outlier samples and predict perceptual quality for a range of music sets and generative models. Finally, we release a toolkit that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Final intern talk: Improving Frechet Audio Distance for Generative Music Evaluation· youtube

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing