Acoustic scene classification using convolutional neural network and   multiple-width frequency-delta data augmentation

Yoonchang Han; Kyogu Lee

arXiv:1607.02383·cs.SD·July 11, 2016·35 cites

Acoustic scene classification using convolutional neural network and multiple-width frequency-delta data augmentation

Yoonchang Han, Kyogu Lee

PDF

Open Access

TL;DR

This paper enhances acoustic scene classification by applying convolutional neural networks with a novel multiple-width frequency-delta data augmentation and a specialized output aggregation method, achieving higher accuracy than previous approaches.

Contribution

It introduces a new MWFD data augmentation technique and folded mean aggregation method, improving ConvNet performance in acoustic scene classification tasks.

Findings

01

ConvNet outperforms baseline hand-crafted features by 7%.

02

MWFD augmentation improves accuracy by 5.7%.

03

Achieved 83.1% classification accuracy on DCASE 2016 dataset.

Abstract

In recent years, neural network approaches have shown superior performance to conventional hand-made features in numerous application areas. In particular, convolutional neural networks (ConvNets) exploit spatially local correlations across input data to improve the performance of audio processing tasks, such as speech recognition, musical chord recognition, and onset detection. Here we apply ConvNet to acoustic scene classification, and show that the error rate can be further decreased by using delta features in the frequency domain. We propose a multiple-width frequency-delta (MWFD) data augmentation method that uses static mel-spectrogram and frequency-delta features as individual input examples. In addition, we describe a ConvNet output aggregation method designed for MWFD augmentation, folded mean aggregation, which combines output probabilities of static and MWFD features from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies