A Study on Robustness to Perturbations for Representations of Environmental Sound
Sangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff, Magdalena Fuentes, Mark, Cartwright, Claudio Silva, Anish Arora, Juan Pablo Bello

TL;DR
This paper extends the HEAR evaluation framework to assess the robustness of environmental sound embeddings against channel effects by injecting perturbations and analyzing embedding shifts, revealing that OpenL3 is more robust than YAMNet.
Contribution
It introduces a task-independent method to evaluate embedding invariance to channel effects, combining perturbation analysis with multiple distance measures for better robustness prediction.
Findings
OpenL3 embeddings are more robust to channel effects than YAMNet.
Fréchet Audio Distance correlates well with downstream performance drops.
Multiple distance measures are necessary for comprehensive robustness evaluation.
Abstract
Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. Therefore, for a given data domain, it is unclear how the representations would be affected by the variations caused by myriad microphones' range and acoustic conditions -- commonly known as channel effects. We aim to extend HEAR to evaluate invariance to channel effects in this work. To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
