R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio   Event Detection

Chieh-Chi Kao; Weiran Wang; Ming Sun; Chao Wang

arXiv:1808.06627·cs.SD·August 22, 2018·6 cites

R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection

Chieh-Chi Kao, Weiran Wang, Ming Sun, Chao Wang

PDF

Open Access

TL;DR

This paper introduces R-CRNN, a novel region-based convolutional recurrent neural network for audio event detection that directly predicts event-level outputs and outperforms existing single-model methods on benchmark datasets.

Contribution

The paper presents a new R-CRNN model that combines region-based detection with recurrent layers for improved audio event detection, trained end-to-end with multitask loss.

Findings

01

R-CRNN achieves state-of-the-art performance without ensemble methods.

02

It reduces the event-based error rate by half compared to previous region-based networks.

03

The model effectively captures long-term temporal context for accurate event localization.

Abstract

This paper proposes a Region-based Convolutional Recurrent Neural Network (R-CRNN) for audio event detection (AED). The proposed network is inspired by Faster-RCNN, a well known region-based convolutional network framework for visual object detection. Different from the original Faster-RCNN, a recurrent layer is added on top of the convolutional network to capture the long-term temporal context from the extracted high level features. While most of the previous works on AED generate predictions at frame level first, and then use post-processing to predict the onset/offset timestamps of events from a probability sequence; the proposed method generates predictions at event level directly and can be trained end-to-end with a multitask loss, which optimizes the classification and localization of audio events simultaneously. The proposed method is tested on DCASE 2017 Challenge dataset. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Digital Media Forensic Detection