Knowledge Transfer from Weakly Labeled Audio using Convolutional Neural Network for Sound Events and Scenes
Anurag Kumar, Maksim Khadkevich, Christian Fugen

TL;DR
This paper introduces a CNN-based framework for sound event detection and scene classification using weakly labeled web audio data, achieving state-of-the-art results and effective transfer learning.
Contribution
It presents novel methods for transfer learning from weakly labeled audio, enabling effective domain and task adaptation with a CNN model trained on variable-length audio.
Findings
Achieved human-level accuracy on ESC-50 dataset.
Set new state-of-the-art results on Audioset.
Demonstrated effective semantic representation learning.
Abstract
In this work we propose approaches to effectively transfer knowledge from weakly labeled web audio data. We first describe a convolutional neural network (CNN) based framework for sound event detection and classification using weakly labeled audio data. Our model trains efficiently from audios of variable lengths; hence, it is well suited for transfer learning. We then propose methods to learn representations using this model which can be effectively used for solving the target task. We study both transductive and inductive transfer learning tasks, showing the effectiveness of our methods for both domain and task adaptation. We show that the learned representations using the proposed CNN model generalizes well enough to reach human level accuracy on ESC-50 sound events dataset and set state of art results on this dataset. We further use them for acoustic scene classification task and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
