Unsupervised Discriminative Learning of Sounds for Audio Event Classification
Sascha Hornauer, Ke Li, Stella X. Yu, Shabnam Ghaffarzadegan, Liu Ren

TL;DR
This paper presents an unsupervised discriminative learning approach for audio event classification that pre-trains models solely on audio data, achieving comparable performance to ImageNet pre-training and enabling efficient cross-dataset knowledge transfer.
Contribution
It introduces a novel unsupervised discriminative pre-training method for audio classification that is faster and effective, reducing reliance on large visual datasets.
Findings
Unsupervised audio pre-training matches ImageNet-based performance.
The method enables effective transfer of knowledge across audio datasets.
Pre-training on audio alone is faster and maintains high accuracy.
Abstract
Recent progress in network-based audio event classification has shown the benefit of pre-training models on visual data such as ImageNet. While this process allows knowledge transfer across different domains, training a model on large-scale visual datasets is time consuming. On several audio event classification benchmarks, we show a fast and effective alternative that pre-trains the model unsupervised, only on audio data and yet delivers on-par performance with ImageNet pre-training. Furthermore, we show that our discriminative audio learning can be used to transfer knowledge across audio datasets and optionally include ImageNet pre-training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
