Parameter Sensitivity of Deep-Feature based Evaluation Metrics for Audio Textures
Chitralekha Gupta, Yize Wei, Zequn Gong, Purnima Kamath, Zhuoyao Li,, Lonce Wyse

TL;DR
This paper systematically studies how various audio quality metrics respond to changes in statistical parameters of audio textures, highlighting their sensitivities and potential for improved evaluation of audio texture synthesis.
Contribution
It introduces and evaluates the sensitivity of existing and new deep-feature based metrics to parameter variations in audio textures, a first in this research area.
Findings
Metrics show different sensitivities to texture parameters.
Standard metrics like Inception score and Fréchet Audio Distance are sensitive to certain parameters.
New metrics based on Gram matrices and cochlear models also exhibit distinct sensitivities.
Abstract
Standard evaluation metrics such as the Inception score and Fr\'echet Audio Distance provide a general audio quality distance metric between the synthesized audio and reference clean audio. However, the sensitivity of these metrics to variations in the statistical parameters that define an audio texture is not well studied. In this work, we provide a systematic study of the sensitivity of some of the existing audio quality evaluation metrics to parameter variations in audio textures. Furthermore, we also study three more potentially parameter-sensitive metrics for audio texture synthesis, (a) a Gram matrix based distance, (b) an Accumulated Gram metric using a summarized version of the Gram matrices, and (c) a cochlear-model based statistical features metric. These metrics use deep features that summarize the statistics of any given audio texture, thus being inherently sensitive to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
