Sound Event Detection of Weakly Labelled Data with CNN-Transformer and   Automatic Threshold Optimization

Qiuqiang Kong; Yong Xu; Wenwu Wang; Mark D. Plumbley

arXiv:1912.04761·cs.SD·August 25, 2020·5 cites

Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization

Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

PDF

Open Access 1 Repo

TL;DR

This paper introduces a CNN-Transformer model for weakly labeled sound event detection and proposes an automatic threshold optimization method, significantly improving detection and tagging performance over previous approaches.

Contribution

It presents a novel CNN-Transformer architecture for audio tagging and SED, along with an automatic threshold optimization method to enhance detection accuracy.

Findings

01

Achieved state-of-the-art F1 scores for audio tagging and SED.

02

Demonstrated CNN-Transformer performs comparably to CRNN.

03

Automatic threshold optimization improves F1 scores significantly.

Abstract

Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. \qk{We compare segment-wise and clip-wise training for SED that is lacking in previous works. We propose a convolutional neural network transformer (CNN-Transfomer) for audio tagging and SED, and show that CNN-Transformer performs similarly to a convolutional recurrent neural network (CRNN)}. Another challenge of SED is that thresholds are required for detecting sound events. Previous works set thresholds empirically, and are not an optimal approaches. To solve this problem, we propose an automatic threshold optimization method. The first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qiuqiangkong/sound_event_detection_dcase2017_task4
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies