Acoustic scene classification using convolutional neural network and multiple-width frequency-delta data augmentation
Yoonchang Han, Kyogu Lee

TL;DR
This paper enhances acoustic scene classification by applying convolutional neural networks with a novel multiple-width frequency-delta data augmentation and a specialized output aggregation method, achieving higher accuracy than previous approaches.
Contribution
It introduces a new MWFD data augmentation technique and folded mean aggregation method, improving ConvNet performance in acoustic scene classification tasks.
Findings
ConvNet outperforms baseline hand-crafted features by 7%.
MWFD augmentation improves accuracy by 5.7%.
Achieved 83.1% classification accuracy on DCASE 2016 dataset.
Abstract
In recent years, neural network approaches have shown superior performance to conventional hand-made features in numerous application areas. In particular, convolutional neural networks (ConvNets) exploit spatially local correlations across input data to improve the performance of audio processing tasks, such as speech recognition, musical chord recognition, and onset detection. Here we apply ConvNet to acoustic scene classification, and show that the error rate can be further decreased by using delta features in the frequency domain. We propose a multiple-width frequency-delta (MWFD) data augmentation method that uses static mel-spectrogram and frequency-delta features as individual input examples. In addition, we describe a ConvNet output aggregation method designed for MWFD augmentation, folded mean aggregation, which combines output probabilities of static and MWFD features from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
