Affinity Mixup for Weakly Supervised Sound Event Detection
Mohammad Rasool Izadi, Robert Stevenson, Laura N. Kloepper

TL;DR
This paper introduces Affinity Mixup, a novel regularization method for weakly supervised sound event detection that leverages time-level similarities via an adaptive affinity matrix, significantly improving detection accuracy.
Contribution
The paper proposes a new affinity mixup technique based on attention and graph neural networks to enhance weakly supervised sound event detection.
Findings
Achieved 8.2% improvement in event-F1 scores over state-of-the-art methods.
Introduced an adaptive affinity matrix to relate frames in sound recordings.
Enhanced model performance by incorporating time-level similarities.
Abstract
The weakly supervised sound event detection problem is the task of predicting the presence of sound events and their corresponding starting and ending points in a weakly labeled dataset. A weak dataset associates each training sample (a short recording) to one or more present sources. Networks that solely rely on convolutional and recurrent layers cannot directly relate multiple frames in a recording. Motivated by attention and graph neural networks, we introduce the concept of an affinity mixup to incorporate time-level similarities and make a connection between frames. This regularization technique mixes up features in different layers using an adaptive affinity matrix. Our proposed affinity mixup network improves over state-of-the-art techniques event-F1 scores by .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixup
