Sound Event Detection with Boundary-Aware Optimization and Inference

Florian Schmid; Chi Ian Tang; Sanjeel Parekh; Vamsi Krishna Ithapu; Juan Azcarreta Ortiz; Giacomo Ferroni; Yijun Qian; Arnoldas Jasonas; Cosmin Frateanu; Camilla Clark; Gerhard Widmer; \c{C}a\u{g}da\c{s} Bilen

arXiv:2601.04178·eess.AS·January 8, 2026

Sound Event Detection with Boundary-Aware Optimization and Inference

Florian Schmid, Chi Ian Tang, Sanjeel Parekh, Vamsi Krishna Ithapu, Juan Azcarreta Ortiz, Giacomo Ferroni, Yijun Qian, Arnoldas Jasonas, Cosmin Frateanu, Camilla Clark, Gerhard Widmer, \c{C}a\u{g}da\c{s} Bilen

PDF

Open Access

TL;DR

This paper introduces a boundary-aware approach for sound event detection that explicitly models event onsets and offsets, leading to more accurate and scalable detection without extensive post-processing.

Contribution

It proposes new temporal modeling layers and loss functions that improve event detection accuracy and efficiency in sound event detection tasks.

Findings

01

Outperforms traditional frame-wise models on AudioSet

02

Eliminates need for post-processing hyperparameter tuning

03

Achieves new state-of-the-art performance across AudioSet classes

Abstract

Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Time Series Analysis and Forecasting · Speech and Audio Processing