Over-Parameterization and Generalization in Audio Classification
Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schl\"uter,, Gerhard Widmer

TL;DR
This paper investigates how over-parameterization affects CNN generalization in audio classification, finding that increasing model width enhances device robustness without adding parameters.
Contribution
It demonstrates that scaling CNN width improves generalization to unseen audio devices, revealing a new way to enhance acoustic scene classification models.
Findings
Increasing CNN width improves generalization to unseen devices.
Over-parameterization in width, not parameters, enhances robustness.
Scaling CNN depth has different effects on generalization.
Abstract
Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Specifically, we test scaling CNNs in width and depth, under different conditions. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
