DCASE 2018 Challenge Surrey Cross-Task convolutional neural network   baseline

Qiuqiang Kong; Turab Iqbal; Yong Xu; Wenwu Wang; Mark D. Plumbley

arXiv:1808.00773·cs.SD·December 10, 2019·31 cites

DCASE 2018 Challenge Surrey Cross-Task convolutional neural network baseline

Qiuqiang Kong, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D. Plumbley

PDF

Open Access 5 Repos

TL;DR

This paper presents a unified CNN baseline for all five DCASE 2018 audio classification and sound event detection tasks, demonstrating the effectiveness of deeper networks across most tasks.

Contribution

It introduces a cross-task CNN baseline system based on 4- and 8-layer networks from AlexNet and VGG, providing a common framework for diverse audio tasks.

Findings

01

Deeper CNN (8 layers) outperforms shallower CNN (4 layers) on most tasks.

02

Achieved competitive accuracy and F1 scores across all five tasks.

03

Released open-source code to facilitate future research.

Abstract

The Detection and Classification of Acoustic Scenes and Events (DCASE) consists of five audio classification and sound event detection tasks: 1) Acoustic scene classification, 2) General-purpose audio tagging of Freesound, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio classification. In this paper, we create a cross-task baseline system for all five tasks based on a convlutional neural network (CNN): a "CNN Baseline" system. We implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision. We investigated how the performance varies from task to task with the same configuration of neural networks. Experiments show that deeper CNN with 8 layers performs better than CNN with 4 layers on all tasks except Task 1. Using CNN with 8 layers, we achieve an accuracy of 0.680 on Task 1, an accuracy of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies