Transfer Learning and Bias Correction with Pre-trained Audio Embeddings
Changhong Wang, Ga\"el Richard, Brian McFee

TL;DR
This paper explores how pre-trained audio embeddings used in music instrument recognition can carry biases from their training data, affecting their ability to generalize across different datasets, and proposes methods to mitigate these biases.
Contribution
It systematically analyzes bias propagation in pre-trained audio embeddings and introduces post-processing techniques to enhance cross-dataset generalization.
Findings
Pre-trained embeddings show similar performance on a single dataset but differ in cross-dataset generalization.
Dataset identity and genre distribution are key sources of bias.
Post-processing countermeasures can reduce bias and improve generalization.
Abstract
Deep neural network models have become the dominant approach to a large variety of tasks within music information retrieval (MIR). These models generally require large amounts of (annotated) training data to achieve high accuracy. Because not all applications in MIR have sufficient quantities of training data, it is becoming increasingly common to transfer models across domains. This approach allows representations derived for one task to be applied to another, and can result in high accuracy with less stringent training data requirements for the downstream task. However, the properties of pre-trained audio embeddings are not fully understood. Specifically, and unlike traditionally engineered features, the representations extracted from pre-trained deep networks may embed and propagate biases from the model's training regime. This work investigates the phenomenon of bias propagation in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Speech Recognition and Synthesis
