Receptive Field Regularization Techniques for Audio Classification and   Tagging with Deep Convolutional Neural Networks

Khaled Koutini; Hamid Eghbal-zadeh; Gerhard Widmer

arXiv:2105.12395·cs.SD·May 27, 2021

Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks

Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer

PDF

1 Repo

TL;DR

This paper investigates how tuning the receptive field of CNNs affects their ability to generalize in audio classification and tagging tasks, proposing methods to optimize RF for improved performance.

Contribution

The paper introduces systematic approaches to control CNN receptive fields, demonstrating significant improvements in audio classification and tagging accuracy over existing models.

Findings

01

RF regularization enhances model generalization

02

Proposed methods outperform complex architectures

03

Achieved state-of-the-art results in multiple audio tasks

Abstract

In this paper, we study the performance of variants of well-known Convolutional Neural Network (CNN) architectures on different audio tasks. We show that tuning the Receptive Field (RF) of CNNs is crucial to their generalization. An insufficient RF limits the CNN's ability to fit the training data. In contrast, CNNs with an excessive RF tend to over-fit the training data and fail to generalize to unseen testing data. As state-of-the-art CNN architectures-in computer vision and other domains-tend to go deeper in terms of number of layers, their RF size increases and therefore they degrade in performance in several audio classification and tagging tasks. We study well-known CNN architectures and how their building blocks affect their receptive field. We propose several systematic approaches to control the RF of CNNs and systematically test the resulting architectures on different audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kkoutini/cpjku_dcase20
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.