Rethinking CNN Models for Audio Classification
Kamalesh Palanisamy, Dipika Singhania, Angela Yao

TL;DR
This paper demonstrates that ImageNet-pretrained CNNs are effective for audio classification using spectrograms, achieving state-of-the-art results and highlighting the benefits of transfer learning and ensemble methods.
Contribution
It systematically studies the transferability of ImageNet-pretrained CNNs to audio spectrograms and shows their effectiveness for audio classification tasks.
Findings
Pretrained CNNs outperform randomly initialized models.
Visualization reveals what CNNs learn from spectrograms.
Ensemble models improve accuracy significantly.
Abstract
In this paper, we show that ImageNet-Pretrained standard deep CNN models can be used as strong baseline networks for audio classification. Even though there is a significant difference between audio Spectrogram and standard ImageNet image samples, transfer learning assumptions still hold firmly. To understand what enables the ImageNet pretrained models to learn useful audio representations, we systematically study how much of pretrained weights is useful for learning spectrograms. We show (1) that for a given standard model using pretrained weights is better than using randomly initialized weights (2) qualitative results of what the CNNs learn from the spectrograms by visualizing the gradients. Besides, we show that even though we use the pretrained model weights for initialization, there is variance in performance in various output runs of the same model. This variance in performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Batch Normalization · 1x1 Convolution · Concatenated Skip Connection · Average Pooling · Dense Connections · Kaiming Initialization · Dropout · Dense Block
