Investigating the Sensitivity of Pre-trained Audio Embeddings to Common   Effects

Victor Deng (ENS-PSL); Changhong Wang (LTCI; S2A; IDS); Gael Richard; (S2A; IDS; LTCI); Brian McFee (NYU)

arXiv:2501.15900·cs.LG·January 28, 2025

Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects

Victor Deng (ENS-PSL), Changhong Wang (LTCI, S2A, IDS), Gael Richard, (S2A, IDS, LTCI), Brian McFee (NYU)

PDF

TL;DR

This paper examines how pre-trained audio embeddings respond to common audio effects, revealing that these embeddings do not linearly encode effects and that removing effect directions does not enhance robustness.

Contribution

It introduces a method to analyze the sensitivity of audio embeddings to effects and demonstrates their high-dimensional, non-linear deformation in embedding space.

Findings

01

Embeddings move monotonically with effect strength along certain directions.

02

The deformation subspace is high-dimensional, indicating non-linearity.

03

Removing estimated effect directions does not improve robustness.

Abstract

In recent years, foundation models have significantly advanced data-driven systems across various domains. Yet, their underlying properties, especially when functioning as feature extractors, remain under-explored. In this paper, we investigate the sensitivity to audio effects of audio embeddings extracted from widely-used foundation models, including OpenL3, PANNs, and CLAP. We focus on audio effects as the source of sensitivity due to their prevalent presence in large audio datasets. By applying parameterized audio effects (gain, low-pass filtering, reverberation, and bitcrushing), we analyze the correlation between the deformation trajectories and the effect strength in the embedding space. We propose to quantify the dimensionality and linearizability of the deformation trajectories induced by audio effects using canonical correlation analysis. We find that there exists a direction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus