# Evaluation of post-processing algorithms for polyphonic sound event   detection

**Authors:** Leo Cances, Patrice Guyot, Thomas Pellegrini

arXiv: 1906.06909 · 2019-06-25

## TL;DR

This paper evaluates various post-processing algorithms for polyphonic sound event detection, emphasizing their significant impact on detection accuracy and comparing statistical and parametric segmentation methods on challenge data.

## Contribution

It systematically investigates and compares different post-processing techniques for temporal segmentation in sound event detection, highlighting their importance.

## Key findings

- Statistic-based methods achieved a 22.9% F-score.
- Class-dependent parametric methods achieved a 32.0% F-score.
- Post-processing significantly influences detection performance.

## Abstract

Sound event detection (SED) aims at identifying audio events (audio tagging task) in recordings and then locating them temporally (localization task). This last task ends with the segmentation of the frame-level class predictions, that determines the onsets and offsets of the audio events. Yet, this step is often overlooked in scientific publications. In this paper, we focus on the post-processing algorithms used to identify the audio event boundaries. Different post-processing steps are investigated, through smoothing, thresholding, and optimization. In particular, we evaluate different approaches for temporal segmentation, namely statistic-based and parametric methods. Experiments are carried out on the DCASE 2018 challenge task 4 data. We compared post-processing algorithms on the temporal prediction curves of two models: one based on the challenge's baseline and a Multiple Instance Learning (MIL) model. Results show the crucial impact of the post-processing methods on the final detection score. Statistic-based methods yield a 22.9% event-based F-score on the evaluation set with our MIL model. Moreover, the best results were obtained using class-dependent parametric methods with 32.0% F-score.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.06909/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1906.06909/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1906.06909/full.md

---
Source: https://tomesphere.com/paper/1906.06909