A Study on Robustness to Perturbations for Representations of   Environmental Sound

Sangeeta Srivastava; Ho-Hsiang Wu; Joao Rulff; Magdalena Fuentes; Mark; Cartwright; Claudio Silva; Anish Arora; Juan Pablo Bello

arXiv:2203.10425·cs.SD·July 8, 2022

A Study on Robustness to Perturbations for Representations of Environmental Sound

Sangeeta Srivastava, Ho-Hsiang Wu, Joao Rulff, Magdalena Fuentes, Mark, Cartwright, Claudio Silva, Anish Arora, Juan Pablo Bello

PDF

Open Access

TL;DR

This paper extends the HEAR evaluation framework to assess the robustness of environmental sound embeddings against channel effects by injecting perturbations and analyzing embedding shifts, revealing that OpenL3 is more robust than YAMNet.

Contribution

It introduces a task-independent method to evaluate embedding invariance to channel effects, combining perturbation analysis with multiple distance measures for better robustness prediction.

Findings

01

OpenL3 embeddings are more robust to channel effects than YAMNet.

02

Fréchet Audio Distance correlates well with downstream performance drops.

03

Multiple distance measures are necessary for comprehensive robustness evaluation.

Abstract

Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. Therefore, for a given data domain, it is unclear how the representations would be affected by the variations caused by myriad microphones' range and acoustic conditions -- commonly known as channel effects. We aim to extend HEAR to evaluate invariance to channel effects in this work. To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies