# Sound Event Detection Using Spatial Features and Convolutional Recurrent   Neural Network

**Authors:** Sharath Adavanne, Pasi Pertil\"a, Tuomas Virtanen

arXiv: 1706.02291 · 2017-06-09

## TL;DR

This paper introduces a method for sound event detection that leverages spatial features from multichannel audio and a specialized neural network architecture, resulting in improved detection accuracy over monaural approaches.

## Contribution

It presents a novel approach that uses separate spatial feature layers in a convolutional recurrent neural network for better sound event detection from multichannel audio.

## Key findings

- Achieved 6.1% F-score improvement on TUT-SED 2016 dataset.
- Achieved 2.7% F-score improvement on larger TUT-SED 2009 dataset.
- Demonstrated the effectiveness of spatial features over monaural features.

## Abstract

This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the proposed spatial features over monaural features on the same network gives an absolute F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and 2.7% on the TUT-SED 2009 dataset that is fifteen times larger.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.02291/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1706.02291/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1706.02291/full.md

---
Source: https://tomesphere.com/paper/1706.02291