DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity   Acoustic Scene Classification

Jingqiao Zhao; Zhen-Hua Feng; Qiuqiang Kong; Xiaoning Song; Xiao-Jun; Wu

arXiv:2007.12864·cs.SD·July 28, 2020

DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification

Jingqiao Zhao, Zhen-Hua Feng, Qiuqiang Kong, Xiaoning Song, Xiao-Jun, Wu

PDF

Open Access

TL;DR

This paper introduces DD-CNN, a low-complexity neural network using depthwise separable convolutions, SpecAugment, and Disout for efficient urban acoustic scene classification, achieving high accuracy with reduced complexity.

Contribution

The paper proposes a novel DD-CNN architecture that combines depthwise separable convolutions with data augmentation and regularization techniques for improved acoustic scene classification.

Findings

01

Achieved 92.04% accuracy on DCASE2020 validation set.

02

Reduced network complexity while maintaining high classification performance.

03

Effective learning of discriminative acoustic features from audio fragments.

Abstract

This paper presents a Depthwise Disout Convolutional Neural Network (DD-CNN) for the detection and classification of urban acoustic scenes. Specifically, we use log-mel as feature representations of acoustic signals for the inputs of our network. In the proposed DD-CNN, depthwise separable convolution is used to reduce the network complexity. Besides, SpecAugment and Disout are used for further performance boosting. Experimental results demonstrate that our DD-CNN can learn discriminative acoustic characteristics from audio fragments and effectively reduce the network complexity. Our DD-CNN was used for the low-complexity acoustic scene classification task of the DCASE2020 Challenge, which achieves 92.04% accuracy on the validation set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis

MethodsDepthwise Convolution · Convolution · Pointwise Convolution · Depthwise Separable Convolution