Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

Changhong Wang; Ga\"el Richard; Brian McFee

arXiv:2307.10834·eess.AS·July 21, 2023

Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

Changhong Wang, Ga\"el Richard, Brian McFee

PDF

Open Access 1 Repo

TL;DR

This paper explores how pre-trained audio embeddings used in music instrument recognition can carry biases from their training data, affecting their ability to generalize across different datasets, and proposes methods to mitigate these biases.

Contribution

It systematically analyzes bias propagation in pre-trained audio embeddings and introduces post-processing techniques to enhance cross-dataset generalization.

Findings

01

Pre-trained embeddings show similar performance on a single dataset but differ in cross-dataset generalization.

02

Dataset identity and genre distribution are key sources of bias.

03

Post-processing countermeasures can reduce bias and improve generalization.

Abstract

Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all applications in MIR have sufficient quantities of training data, it is becoming increasingly common to transfer models across domains. This approach allows representations derived for one task to be applied to another, and can result in high accuracy with less stringent training data requirements for the downstream task. However, the properties of pre-trained audio embeddings are not fully understood. Specifically, and unlike traditionally engineered features, the representations extracted from pre-trained deep networks may embed and propagate biases from the model's training regime. This work investigates the phenomenon of bias propagation in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

changhongw/audio-embedding-bias
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech Recognition and Synthesis