Device-Robust Acoustic Scene Classification Based on Two-Stage   Categorization and Data Augmentation

Hu Hu; Chao-Han Huck Yang; Xianjun Xia; Xue Bai; Xin Tang; Yajian; Wang; Shutong Niu; Li Chai; Juanjuan Li; Hongning Zhu; Feng Bao; Yuanjun; Zhao; Sabato Marco Siniscalchi; Yannan Wang; Jun Du; Chin-Hui Lee

arXiv:2007.08389·eess.AS·August 28, 2020·46 cites

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian, Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun, Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces a two-stage CNN-based system with data augmentation for device-robust acoustic scene classification, achieving high accuracy on DCASE 2020 Challenge tasks.

Contribution

It proposes a novel two-stage classification approach combined with data augmentation and model quantization for improved device-robust acoustic scene classification.

Findings

01

Achieved 76.9% accuracy on Task 1a with data augmentation.

02

Attained 81.9% accuracy through model fusion.

03

Reached 96.7% accuracy on Task 1b with a compact model.

Abstract

In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes, and (ii) Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions. For Task 1a, we propose a novel two-stage ASC system leveraging upon ad-hoc score combination of two convolutional neural networks (CNNs), classifying the acoustic input according to three classes, and then ten classes, respectively. Four different CNN-based architectures are explored to implement the two-stage classifiers, and several data augmentation techniques are also investigated. For Task 1b, we leverage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MihawkHu/DCASE2020_task1
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies