Cross-task learning for audio tagging, sound event detection and spatial   localization: DCASE 2019 baseline systems

Qiuqiang Kong; Yin Cao; Turab Iqbal; Yong Xu; Wenwu Wang; Mark D.; Plumbley

arXiv:1904.03476·cs.SD·June 11, 2019·36 cites

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Qiuqiang Kong, Yin Cao, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D., Plumbley

PDF

Open Access

TL;DR

This paper presents generic CNN-based baseline systems for multiple audio recognition tasks in the DCASE 2019 challenge, demonstrating that a 9-layer CNN with average pooling performs well across diverse tasks.

Contribution

It introduces cross-task CNN baseline systems for various audio recognition challenges, analyzing their performance without task-specific modifications.

Findings

01

9-layer CNN with average pooling is effective for most tasks

02

Optimal CNN architecture varies depending on the task

03

Baseline systems provide a common framework for different audio tasks

Abstract

The Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge focuses on audio tagging, sound event detection and spatial localisation. DCASE 2019 consists of five tasks: 1) acoustic scene classification, 2) audio tagging with noisy labels and minimal supervision, 3) sound event localisation and detection, 4) sound event detection in domestic environments, and 5) urban sound tagging. In this paper, we propose generic cross-task baseline systems based on convolutional neural networks (CNNs). The motivation is to investigate the performance of a variety of models across several audio recognition tasks without exploiting the specific characteristics of the tasks. We looked at CNNs with 5, 9, and 13 layers, and found that the optimal architecture is task-dependent. For the systems we considered, we found that the 9-layer CNN with average pooling after convolutional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies