# Convolutional Recurrent Neural Networks for Polyphonic Sound Event   Detection

**Authors:** Emre \c{C}ak{\i}r, Giambattista Parascandolo, Toni Heittola, Heikki, Huttunen, Tuomas Virtanen

arXiv: 1702.06286 · 2017-05-31

## TL;DR

This paper introduces a Convolutional Recurrent Neural Network (CRNN) that combines CNN and RNN architectures to improve polyphonic sound event detection across multiple datasets, outperforming existing methods.

## Contribution

The paper proposes a novel CRNN architecture that leverages CNNs for feature extraction and RNNs for temporal modeling, demonstrating superior performance in sound event detection.

## Key findings

- CRNN outperforms CNN and RNN on four datasets
- Significant improvement over established methods
- Effective in recognizing diverse everyday sound events

## Abstract

Sound events often occur in unstructured environments where they exhibit wide variations in their frequency content and temporal structure. Convolutional neural networks (CNN) are able to extract higher level features that are invariant to local spectral and temporal variations. Recurrent neural networks (RNNs) are powerful in learning the longer term temporal context in the audio signals. CNNs and RNNs as classifiers have recently shown improved performances over established methods in various sound recognition tasks. We combine these two approaches in a Convolutional Recurrent Neural Network (CRNN) and apply it on a polyphonic sound event detection task. We compare the performance of the proposed CRNN method with CNN, RNN, and other established methods, and observe a considerable improvement for four different datasets consisting of everyday sound events.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.06286/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1702.06286/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/1702.06286/full.md

---
Source: https://tomesphere.com/paper/1702.06286