TL;DR
This paper introduces novel forward-backward convolutional recurrent neural networks and tag-conditioned CNNs for weakly labeled semi-supervised sound event detection, achieving state-of-the-art results in the DCASE 2020 Challenge.
Contribution
The paper presents two new models, FBCRNN and tag-conditioned CNN, for improved sound event detection using weak labels and pseudo strong labels.
Findings
Achieved 18.0% improvement in event-based F1-score over baseline.
Outperformed top challenge systems in validation set.
Proposed models enable detection on short audio segments.
Abstract
In this paper we present our system for the detection and classification of acoustic scenes and events (DCASE) 2020 Challenge Task 4: Sound event detection and separation in domestic environments. We introduce two new models: the forward-backward convolutional recurrent neural network (FBCRNN) and the tag-conditioned convolutional neural network (CNN). The FBCRNN employs two recurrent neural network (RNN) classifiers sharing the same CNN for preprocessing. With one RNN processing a recording in forward direction and the other in backward direction, the two networks are trained to jointly predict audio tags, i.e., weak labels, at each time step within a recording, given that at each time step they have jointly processed the whole recording. The proposed training encourages the classifiers to tag events as soon as possible. Therefore, after training, the networks can be applied to shorter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
