Hierarchical learning for DNN-based acoustic scene classification

Yong Xu; Qiang Huang; Wenwu Wang; Mark D. Plumbley

arXiv:1607.03682·cs.SD·August 16, 2016·21 cites

Hierarchical learning for DNN-based acoustic scene classification

Yong Xu, Qiang Huang, Wenwu Wang, Mark D. Plumbley

PDF

Open Access

TL;DR

This paper introduces hierarchical learning methods for DNN-based acoustic scene classification, leveraging taxonomy information to significantly improve performance over traditional GMM benchmarks.

Contribution

It proposes hierarchical pre-training and multi-level objective functions to enhance DNN performance using environmental sound taxonomy.

Findings

01

22.9% relative error reduction over GMM baseline

02

Effective incorporation of hierarchical taxonomy improves classification accuracy

03

Validated on DCASE 2016 challenge dataset

Abstract

In this paper, we present a deep neural network (DNN)-based acoustic scene classification framework. Two hierarchical learning methods are proposed to improve the DNN baseline performance by incorporating the hierarchical taxonomy information of environmental sounds. Firstly, the parameters of the DNN are initialized by the proposed hierarchical pre-training. Multi-level objective function is then adopted to add more constraint on the cross-entropy based loss function. A series of experiments were conducted on the Task1 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. The final DNN-based system achieved a 22.9% relative improvement on average scene classification error as compared with the Gaussian Mixture Model (GMM)-based benchmark system across four standard folds.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis