Efficient Target Activity Detection based on Recurrent Neural Networks

Daniel Gerber; Stefan Meier; and Walter Kellermann

arXiv:1612.06642·cs.SD·December 21, 2016

Efficient Target Activity Detection based on Recurrent Neural Networks

Daniel Gerber, Stefan Meier, and Walter Kellermann

PDF

Open Access

TL;DR

This paper compares different neural network architectures for target activity detection in binaural listening devices, demonstrating that RNNs outperform FNNs in challenging acoustic environments.

Contribution

It introduces the use of RNNs, including LSTMs and GRUs, for TAD and evaluates their performance against FNNs in small network topologies.

Findings

01

RNNs outperform FNNs for TAD.

02

All RNN variants show improved detection accuracy.

03

Small RNNs are suitable for embedded systems.

Abstract

This paper addresses the problem of Target Activity Detection (TAD) for binaural listening devices. TAD denotes the problem of robustly detecting the activity of a target speaker in a harsh acoustic environment, which comprises interfering speakers and noise (cocktail party scenario). In previous work, it has been shown that employing a Feed-forward Neural Network (FNN) for detecting the target speaker activity is a promising approach to combine the advantage of different TAD features (used as network inputs). In this contribution, we exploit a larger context window for TAD and compare the performance of FNNs and Recurrent Neural Networks (RNNs) with an explicit focus on small network topologies as desirable for embedded acoustic signal processing systems. More specifically, the investigations include a comparison between three different types of RNNs, namely plain RNNs, Long Short-Term…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis