Sound event detection using weakly labeled dataset with stacked   convolutional and recurrent neural network

Sharath Adavanne; Tuomas Virtanen

arXiv:1710.02998·cs.SD·October 10, 2017·42 cites

Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network

Sharath Adavanne, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces a neural network architecture that learns to detect the start and end times of sound events using only weak labels, by combining convolutional and recurrent layers with a novel training scheme.

Contribution

It presents a stacked convolutional and recurrent neural network with a dual prediction layer approach for weakly supervised sound event detection.

Findings

01

Achieved an error rate of 0.84 for strong labels

02

F-score of 43.3% for weak labels on test data

03

Effective training scheme controlling learning from weak and strong labels

Abstract

This paper proposes a neural network architecture and training scheme to learn the start and end time of sound events (strong labels) in an audio recording given just the list of sound events existing in the audio without time information (weak labels). We achieve this by using a stacked convolutional and recurrent neural network with two prediction layers in sequence one for the strong followed by the weak label. The network is trained using frame-wise log mel-band energy as the input audio feature, and weak labels provided in the dataset as labels for the weak label prediction layer. Strong labels are generated by replicating the weak labels as many number of times as the frames in the input audio feature, and used for strong label layer during training. We propose to control what the network learns from the weak and strong labels by different weighting for the loss computed in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies