Interactive Dual-Conformer with Scene-Inspired Mask for Soft Sound Event Detection
Han Yin, Jisheng Bai, Mou Wang, Dongyuan Shi, Woon-Seng Gan, Jianfeng, Chen

TL;DR
This paper introduces an interactive dual-conformer module and a scene-inspired mask for soft sound event detection, leveraging soft labels and adaptive masking to improve accuracy, achieving top performance in a recent challenge.
Contribution
It proposes a novel IDC module with cross-interaction for soft label exploitation and a scene-inspired mask with adaptive estimation, advancing soft sound event detection methods.
Findings
IDC effectively utilizes soft label information
SIM-V1 improves detection accuracy over fixed masks
SIM-V2 with word embeddings outperforms SIM-V1
Abstract
Traditional binary hard labels for sound event detection (SED) lack details about the complexity and variability of sound event distributions. Recently, a novel annotation workflow is proposed to generate fine-grained non-binary soft labels, resulting in a new real-life dataset named MAESTRO Real for SED. In this paper, we first propose an interactive dual-conformer (IDC) module, in which a cross-interaction mechanism is applied to effectively exploit the information from soft labels. In addition, a novel scene-inspired mask (SIM) based on soft labels is incorporated for more precise SED predictions. The SIM is initially generated through a statistical approach, referred as SIM-V1. However, the fixed artificial mask may mismatch the SED model, resulting in limited effectiveness. Therefore, we further propose SIM-V2, which employs a word embedding model for adaptive SIM estimation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
