Unsupervised Discriminative Learning of Sounds for Audio Event   Classification

Sascha Hornauer; Ke Li; Stella X. Yu; Shabnam Ghaffarzadegan; Liu Ren

arXiv:2105.09279·cs.SD·May 21, 2021

Unsupervised Discriminative Learning of Sounds for Audio Event Classification

Sascha Hornauer, Ke Li, Stella X. Yu, Shabnam Ghaffarzadegan, Liu Ren

PDF

TL;DR

This paper presents an unsupervised discriminative learning approach for audio event classification that pre-trains models solely on audio data, achieving comparable performance to ImageNet pre-training and enabling efficient cross-dataset knowledge transfer.

Contribution

It introduces a novel unsupervised discriminative pre-training method for audio classification that is faster and effective, reducing reliance on large visual datasets.

Findings

01

Unsupervised audio pre-training matches ImageNet-based performance.

02

The method enables effective transfer of knowledge across audio datasets.

03

Pre-training on audio alone is faster and maintains high accuracy.

Abstract

Recent progress in network-based audio event classification has shown the benefit of pre-training models on visual data such as ImageNet. While this process allows knowledge transfer across different domains, training a model on large-scale visual datasets is time consuming. On several audio event classification benchmarks, we show a fast and effective alternative that pre-trains the model unsupervised, only on audio data and yet delivers on-par performance with ImageNet pre-training. Furthermore, we show that our discriminative audio learning can be used to transfer knowledge across audio datasets and optionally include ImageNet pre-training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.