Impact of Sound Duration and Inactive Frames on Sound Event Detection Performance
Keisuke Imoto, Sakiko Mishima, Yumi Arai, Reishi Kondo

TL;DR
This paper investigates how sound duration and inactive frames affect sound event detection performance, proposing loss functions to address data imbalance issues caused by class duration differences and infrequent event occurrences.
Contribution
It introduces four loss functions to mitigate data imbalance in SED caused by varying sound durations and inactive frames, providing new strategies for improved detection accuracy.
Findings
Loss functions improve detection of rare and long-duration sound events.
Addressing data imbalance enhances overall SED performance.
Insights into handling class and frame imbalance in sound detection.
Abstract
In many methods of sound event detection (SED), a segmented time frame is regarded as one data sample to model training. The durations of sound events greatly depend on the sound event class, e.g., the sound event "fan" has a long duration, whereas the sound event "mouse clicking" is instantaneous. Thus, the difference in the duration between sound event classes results in a serious data imbalance in SED. Moreover, most sound events tend to occur occasionally; therefore, there are many more inactive time frames of sound events than active frames. This also causes a severe data imbalance between active and inactive frames. In this paper, we investigate the impact of sound duration and inactive frames on SED performance by introducing four loss functions, such as simple reweighting loss, inverse frequency loss, asymmetric focal loss, and focal batch Tversky loss. Then, we provide insights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
