Rethinking CNN Models for Audio Classification

Kamalesh Palanisamy; Dipika Singhania; Angela Yao

arXiv:2007.11154·cs.CV·November 17, 2020·109 cites

Rethinking CNN Models for Audio Classification

Kamalesh Palanisamy, Dipika Singhania, Angela Yao

PDF

Open Access 3 Repos

TL;DR

This paper demonstrates that ImageNet-pretrained CNNs are effective for audio classification using spectrograms, achieving state-of-the-art results and highlighting the benefits of transfer learning and ensemble methods.

Contribution

It systematically studies the transferability of ImageNet-pretrained CNNs to audio spectrograms and shows their effectiveness for audio classification tasks.

Findings

01

Pretrained CNNs outperform randomly initialized models.

02

Visualization reveals what CNNs learn from spectrograms.

03

Ensemble models improve accuracy significantly.

Abstract

In this paper, we show that ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification. Even though there is a significant difference between audio Spectrogram and standard ImageNet image samples, transfer learning assumptions still hold firmly. To understand what enables the ImageNet pretrained models to learn useful audio representations, we systematically study how much of pretrained weights is useful for learning spectrograms. We show (1) that for a given standard model using pretrained weights is better than using randomly initialized weights (2) qualitative results of what the CNNs learn from the spectrograms by visualizing the gradients. Besides, we show that even though we use the pretrained model weights for initialization, there is variance in performance in various output runs of the same model. This variance in performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Batch Normalization · 1x1 Convolution · Concatenated Skip Connection · Average Pooling · Dense Connections · Kaiming Initialization · Dropout · Dense Block