# Duration robust weakly supervised sound event detection

**Authors:** Heinrich Dinkel, Kai Yu

arXiv: 1904.03841 · 2020-04-13

## TL;DR

This paper investigates robust sound event detection using fixed-sized window median filtering and double thresholding, proposing four temporal subsampling methods within CRNNs to improve accuracy and robustness to event length variations.

## Contribution

It introduces four novel temporal subsampling techniques within CRNNs and advocates double thresholding for more robust sound event detection.

## Key findings

- Best single model achieves 30.1% F1 score
- Fusion model achieves 32.5% F1 score
- Subsampling improves robustness to short, sporadic events

## Abstract

Task 4 of the DCASE2018 challenge demonstrated that substantially more research is needed for a real-world application of sound event detection. Analyzing the challenge results it can be seen that most successful models are biased towards predicting long (e.g., over 5s) clips. This work aims to investigate the performance impact of fixed-sized window median filter post-processing and advocate the use of double thresholding as a more robust and predictable post-processing method. Further, four different temporal subsampling methods within the CRNN framework are proposed: mean-max, alpha-mean-max, Lp-norm and convolutional. We show that for this task subsampling the temporal resolution by a neural network enhances the F1 score as well as its robustness towards short, sporadic sound events. Our best single model achieves 30.1% F1 on the evaluation set and the best fusion model 32.5%, while being robust to event length variations.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.03841/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1904.03841/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1904.03841/full.md

---
Source: https://tomesphere.com/paper/1904.03841