A report on sound event detection with different binaural features
Sharath Adavanne, Tuomas Virtanen

TL;DR
This paper compares binaural audio features to single-channel features for sound event detection, demonstrating that binaural features often outperform single-channel features in error rate metrics.
Contribution
It introduces a comparative analysis of three binaural features for sound event detection using neural networks on a standard dataset.
Findings
Binaural features perform equal or better than single-channel features.
Binaural features reduce error rates in sound event detection.
Evaluation on TUT Sound Events 2017 dataset confirms effectiveness.
Abstract
In this paper, we compare the performance of using binaural audio features in place of single-channel features for sound event detection. Three different binaural features are studied and evaluated on the publicly available TUT Sound Events 2017 dataset of length 70 minutes. Sound event detection is performed separately with single-channel and binaural features using stacked convolutional and recurrent neural network and the evaluation is reported using standard metrics of error rate and F-score. The studied binaural features are seen to consistently perform equal to or better than the single-channel features with respect to error rate metric.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis
