R-CRNN: Region-based Convolutional Recurrent Neural Network for Audio Event Detection
Chieh-Chi Kao, Weiran Wang, Ming Sun, Chao Wang

TL;DR
This paper introduces R-CRNN, a novel region-based convolutional recurrent neural network for audio event detection that directly predicts event-level outputs and outperforms existing single-model methods on benchmark datasets.
Contribution
The paper presents a new R-CRNN model that combines region-based detection with recurrent layers for improved audio event detection, trained end-to-end with multitask loss.
Findings
R-CRNN achieves state-of-the-art performance without ensemble methods.
It reduces the event-based error rate by half compared to previous region-based networks.
The model effectively captures long-term temporal context for accurate event localization.
Abstract
This paper proposes a Region-based Convolutional Recurrent Neural Network (R-CRNN) for audio event detection (AED). The proposed network is inspired by Faster-RCNN, a well known region-based convolutional network framework for visual object detection. Different from the original Faster-RCNN, a recurrent layer is added on top of the convolutional network to capture the long-term temporal context from the extracted high level features. While most of the previous works on AED generate predictions at frame level first, and then use post-processing to predict the onset/offset timestamps of events from a probability sequence; the proposed method generates predictions at event level directly and can be trained end-to-end with a multitask loss, which optimizes the classification and localization of audio events simultaneously. The proposed method is tested on DCASE 2017 Challenge dataset. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Digital Media Forensic Detection
