Sound Event Detection Using Duration Robust Loss Function
Daichi Akiyama, Keisuke Imoto, Noriyuki Tonami, Yuki Okamoto, Ryosuke, Yamanishi, Takahiro Fukumori, Yoichi Yamashita

TL;DR
This paper introduces a duration robust loss function for sound event detection that addresses class imbalance by focusing training on short-duration events, leading to improved detection performance.
Contribution
The paper proposes a novel class-wise reweighted loss function that enhances sound event detection by accounting for event duration variability.
Findings
Improved detection performance by 3.15 percentage points on TUT Sound Events 2016/2017.
Enhanced detection accuracy by 4.37 percentage points on TUT Acoustic Scenes 2016.
Demonstrated effectiveness of duration-aware loss in handling class imbalance.
Abstract
Many methods of sound event detection (SED) based on machine learning regard a segmented time frame as one data sample to model training. However, the sound durations of sound events vary greatly depending on the sound event class, e.g., the sound event ``fan'' has a long time duration, while the sound event ``mouse clicking'' is instantaneous. The difference in the time duration between sound event classes thus causes a serious data imbalance problem in SED. In this paper, we propose a method for SED using a duration robust loss function, which can focus model training on sound events of short duration. In the proposed method, we focus on a relationship between the duration of the sound event and the ease/difficulty of model training. In particular, many sound events of long duration (e.g., sound event ``fan'') are stationary sounds, which have less variation in their acoustic features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
