How Class Ontology and Data Scale Affect Audio Transfer Learning
Manuel Milling, Andreas Triantafyllopoulos, Alexander Gebhard, Simon Rampp, Bj\"orn W. Schuller

TL;DR
This study investigates how class ontology and data scale influence audio transfer learning effectiveness across various tasks, highlighting the importance of data similarity over quantity.
Contribution
It provides a comprehensive analysis of transfer learning in audio, emphasizing the roles of data scale and class ontology in model performance.
Findings
Increasing pre-training data samples and classes improves transfer learning.
Similarity between pre-training and downstream tasks significantly enhances transfer effectiveness.
Pre-training on ontology-based subsets benefits downstream task performance.
Abstract
Transfer learning is a crucial concept within deep learning that allows artificial neural networks to benefit from a large pre-training data basis when confronted with a task of limited data. Despite its ubiquitous use and clear benefits, there are still many open questions regarding the inner workings of transfer learning and, in particular, regarding the understanding of when and how well it works. To that extent, we perform a rigorous study focusing on audio-to-audio transfer learning, in which we pre-train various model states on (ontology-based) subsets of AudioSet and fine-tune them on three computer audition tasks, namely acoustic scene recognition, bird activity recognition, and speech command recognition. We report that increasing the number of samples and classes in the pre-training data both have a positive impact on transfer learning. This is, however, generally surpassed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
