Deep Convolutional Neural Network with Mixup for Environmental Sound Classification
Zhichao Zhang, Shugong Xu, Shan Cao, and Shunqing Zhang

TL;DR
This paper introduces a novel deep convolutional neural network with mixup data augmentation for environmental sound classification, achieving state-of-the-art results on UrbanSound8K and competitive performance on other datasets.
Contribution
The paper proposes a new CNN architecture combined with mixup augmentation specifically tailored for environmental sound classification tasks.
Findings
Achieved 83.7% accuracy on UrbanSound8K
Demonstrated the effectiveness of mixup in improving classification performance
Provided competitive results on ESC-50 and ESC-10 datasets
Abstract
Environmental sound classification (ESC) is an important and challenging problem. In contrast to speech, sound events have noise-like nature and may be produced by a wide variety of sources. In this paper, we propose to use a novel deep convolutional neural network for ESC tasks. Our network architecture uses stacked convolutional and pooling layers to extract high-level feature representations from spectrogram-like features. Furthermore, we apply mixup to ESC tasks and explore its impacts on classification performance and feature distribution. Experiments were conducted on UrbanSound8K, ESC-50 and ESC-10 datasets. Our experimental results demonstrated that our ESC system has achieved the state-of-the-art performance (83.7%) on UrbanSound8K and competitive performance on ESC-50 and ESC-10.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Animal Vocal Communication and Behavior
