Over-Parameterization and Generalization in Audio Classification

Khaled Koutini; Hamid Eghbal-zadeh; Florian Henkel; Jan Schl\"uter,; Gerhard Widmer

arXiv:2107.08933·cs.SD·July 20, 2021

Over-Parameterization and Generalization in Audio Classification

Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schl\"uter,, Gerhard Widmer

PDF

Open Access

TL;DR

This paper investigates how over-parameterization affects CNN generalization in audio classification, finding that increasing model width enhances device robustness without adding parameters.

Contribution

It demonstrates that scaling CNN width improves generalization to unseen audio devices, revealing a new way to enhance acoustic scene classification models.

Findings

01

Increasing CNN width improves generalization to unseen devices.

02

Over-parameterization in width, not parameters, enhances robustness.

03

Scaling CNN depth has different effects on generalization.

Abstract

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Specifically, we test scaling CNNs in width and depth, under different conditions. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis