Sound Event Detection with Boundary-Aware Optimization and Inference
Florian Schmid, Chi Ian Tang, Sanjeel Parekh, Vamsi Krishna Ithapu, Juan Azcarreta Ortiz, Giacomo Ferroni, Yijun Qian, Arnoldas Jasonas, Cosmin Frateanu, Camilla Clark, Gerhard Widmer, \c{C}a\u{g}da\c{s} Bilen

TL;DR
This paper introduces a boundary-aware approach for sound event detection that explicitly models event onsets and offsets, leading to more accurate and scalable detection without extensive post-processing.
Contribution
It proposes new temporal modeling layers and loss functions that improve event detection accuracy and efficiency in sound event detection tasks.
Findings
Outperforms traditional frame-wise models on AudioSet
Eliminates need for post-processing hyperparameter tuning
Achieves new state-of-the-art performance across AudioSet classes
Abstract
Temporal detection problems appear in many fields including time-series estimation, activity recognition and sound event detection (SED). In this work, we propose a new approach to temporal event modeling by explicitly modeling event onsets and offsets, and by introducing boundary-aware optimization and inference strategies that substantially enhance temporal event detection. The presented methodology incorporates new temporal modeling layers - Recurrent Event Detection (RED) and Event Proposal Network (EPN) - which, together with tailored loss functions, enable more effective and precise temporal event detection. We evaluate the proposed method in the SED domain using a subset of the temporally-strongly annotated portion of AudioSet. Experimental results show that our approach not only outperforms traditional frame-wise SED models with state-of-the-art post-processing, but also removes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Time Series Analysis and Forecasting · Speech and Audio Processing
