Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

Ori Kabeli; Yossi Adi; Zhenyu Tang; Buye Xu; Anurag Kumar

arXiv:2106.13493·eess.AS·July 28, 2021

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

PDF

Open Access

TL;DR

This paper transforms a non-causal deep neural network for speaker separation into a causal, real-time model, maintaining high performance with minimal accuracy loss across various acoustic conditions.

Contribution

It introduces a causal, online version of a state-of-the-art speaker separation model, enabling real-time processing with only slight performance degradation.

Findings

01

Minor performance drop of 0.8dB monaurally and 0.3dB binaurally compared to offline models

02

Achieves a real-time factor of 0.65 for online separation

03

Performs effectively under diverse acoustic conditions

Abstract

Deep neural networks have recently shown great success in the task of blind source separation, both under monaural and binaural settings. Although these methods were shown to produce high-quality separations, they were mainly applied under offline settings, in which the model has access to the full input signal while separating the signal. In this study, we convert a non-causal state-of-the-art separation model into a causal and real-time model and evaluate its performance under both online and offline settings. We compare the performance of the proposed model to several baseline methods under anechoic, noisy, and noisy-reverberant recording conditions while exploring both monaural and binaural inputs and outputs. Our findings shed light on the relative difference between causal and non-causal models when performing separation. Our stateful implementation for online separation leads to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing