Sound Event Bounding Boxes

Janek Ebbers; Francois G. Germain; Gordon Wichern; Jonathan Le Roux

arXiv:2406.04212·eess.AS·June 7, 2024

Sound Event Bounding Boxes

Janek Ebbers, Francois G. Germain, Gordon Wichern, Jonathan Le Roux

PDF

Open Access 1 Repo

TL;DR

This paper introduces SEBBs, a new format for sound event detection that separates event extent prediction from confidence, significantly improving system performance and surpassing previous state-of-the-art results.

Contribution

The paper proposes SEBBs, a novel prediction format for sound events, and a change-detection algorithm to enhance event extent accuracy and system performance.

Findings

01

SEBBs improve event extent prediction accuracy.

02

The change-detection algorithm boosts DCASE 2023 Challenge system performance.

03

State-of-the-art PSDS1 score increased from .644 to .686.

Abstract

Sound event detection is the task of recognizing sounds and determining their extent (onset/offset times) within an audio clip. Existing systems commonly predict sound presence confidence in short time frames. Then, thresholding produces binary frame-level presence decisions, with the extent of individual events determined by merging consecutive positive frames. In this paper, we show that frame-level thresholding degrades the prediction of the event extent by coupling it with the system's sound presence confidence. We propose to decouple the prediction of event extent and confidence by introducing SEBBs, which format each sound event prediction as a tuple of a class type, extent, and overall confidence. We also propose a change-detection-based algorithm to convert legacy frame-level outputs into SEBBs. We find the algorithm significantly improves the performance of DCASE 2023 Challenge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

merlresearch/sebbs
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Time Series Analysis and Forecasting