TL;DR
This paper investigates how restricting the receptive field of CNNs affects acoustic scene classification, introduces Frequency Aware CNNs to recover lost frequency information, and demonstrates improved performance on DCASE 2019 tasks.
Contribution
It systematically studies RF configurations in CNNs for audio tasks and proposes Frequency Aware CNNs to enhance performance when RF is restricted.
Findings
Restricted RF impacts CNN performance and robustness.
Frequency Aware CNNs improve classification accuracy.
Several submissions achieved top results in DCASE 2019 Challenge.
Abstract
Acoustic scene classification and related tasks have been dominated by Convolutional Neural Networks (CNNs). Top-performing CNNs use mainly audio spectograms as input and borrow their architectural design primarily from computer vision. A recent study has shown that restricting the receptive field (RF) of CNNs in appropriate ways is crucial for their performance, robustness and generalization in audio tasks. One side effect of restricting the RF of CNNs is that more frequency information is lost. In this paper, we perform a systematic investigation of different RF configuration for various CNN architectures on the DCASE 2019 Task 1.A dataset. Second, we introduce Frequency Aware CNNs to compensate for the lack of frequency information caused by the restricted RF, and experimentally determine if and in what RF ranges they yield additional improvement. The result of these investigations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
