A simple model for detection of rare sound events

Weiran Wang; Chieh-chi Kao; Chao Wang

arXiv:1808.06676·cs.SD·August 22, 2018

A simple model for detection of rare sound events

Weiran Wang, Chieh-chi Kao, Chao Wang

PDF

Open Access

TL;DR

This paper introduces a simple recurrent model that detects rare sound events by combining utterance-level and frame-level classification losses, utilizing attention mechanisms, and demonstrates competitive results on a standard challenge dataset.

Contribution

The paper presents a novel recurrent model that jointly optimizes utterance and frame-level losses with attention, specifically designed for rare sound event detection.

Findings

01

Achieved competitive performance on DCASE 2017 Task 2

02

Effectively combines utterance and frame-level classification

03

Utilizes attention mechanism for improved detection

Abstract

We propose a simple recurrent model for detecting rare sound events, when the time boundaries of events are available for training. Our model optimizes the combination of an utterance-level loss, which classifies whether an event occurs in an utterance, and a frame-level loss, which classifies whether each frame corresponds to the event when it does occur. The two losses make use of a shared vectorial representation the event, and are connected by an attention mechanism. We demonstrate our model on Task 2 of the DCASE 2017 challenge, and achieve competitive performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Topic Modeling