Who calls the shots? Rethinking Few-Shot Learning for Audio

Yu Wang; Nicholas J. Bryan; Justin Salamon; Mark Cartwright; Juan; Pablo Bello

arXiv:2110.09600·cs.SD·October 20, 2021

Who calls the shots? Rethinking Few-Shot Learning for Audio

Yu Wang, Nicholas J. Bryan, Justin Salamon, Mark Cartwright, Juan, Pablo Bello

PDF

Open Access 1 Repo

TL;DR

This paper investigates the unique challenges of few-shot learning in audio recognition, emphasizing the importance of application-specific approaches due to audio's multi-label nature and properties like polyphony and SNR.

Contribution

It introduces two new datasets for audio few-shot learning and provides audio-specific insights that challenge assumptions from image-based few-shot learning research.

Findings

01

No single best model or method for all audio scenarios

02

Support set selection depends on application context

03

Audio properties significantly influence few-shot learning performance

Abstract

Few-shot learning aims to train models that can recognize novel classes given just a handful of labeled examples, known as the support set. While the field has seen notable advances in recent years, they have often focused on multi-class image classification. Audio, in contrast, is often multi-label due to overlapping sounds, resulting in unique properties such as polyphony and signal-to-noise ratios (SNR). This leads to unanswered questions concerning the impact such audio properties may have on few-shot learning system design, performance, and human-computer interaction, as it is typically up to the user to collect and provide inference-time support set examples. We address these questions through a series of experiments designed to elucidate the answers to these questions. We introduce two novel datasets, FSD-MIX-CLIPS and FSD-MIX-SED, whose programmatic generation allows us to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangyu/rethink-audio-fsl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis