A simple model for detection of rare sound events
Weiran Wang, Chieh-chi Kao, Chao Wang

TL;DR
This paper introduces a simple recurrent model that detects rare sound events by combining utterance-level and frame-level classification losses, utilizing attention mechanisms, and demonstrates competitive results on a standard challenge dataset.
Contribution
The paper presents a novel recurrent model that jointly optimizes utterance and frame-level losses with attention, specifically designed for rare sound event detection.
Findings
Achieved competitive performance on DCASE 2017 Task 2
Effectively combines utterance and frame-level classification
Utilizes attention mechanism for improved detection
Abstract
We propose a simple recurrent model for detecting rare sound events, when the time boundaries of events are available for training. Our model optimizes the combination of an utterance-level loss, which classifies whether an event occurs in an utterance, and a frame-level loss, which classifies whether each frame corresponds to the event when it does occur. The two losses make use of a shared vectorial representation the event, and are connected by an attention mechanism. We demonstrate our model on Task 2 of the DCASE 2017 challenge, and achieve competitive performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Topic Modeling
