Online Self-Attentive Gated RNNs for Real-Time Speaker Separation
Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

TL;DR
This paper transforms a non-causal deep neural network for speaker separation into a causal, real-time model, maintaining high performance with minimal accuracy loss across various acoustic conditions.
Contribution
It introduces a causal, online version of a state-of-the-art speaker separation model, enabling real-time processing with only slight performance degradation.
Findings
Minor performance drop of 0.8dB monaurally and 0.3dB binaurally compared to offline models
Achieves a real-time factor of 0.65 for online separation
Performs effectively under diverse acoustic conditions
Abstract
Deep neural networks have recently shown great success in the task of blind source separation, both under monaural and binaural settings. Although these methods were shown to produce high-quality separations, they were mainly applied under offline settings, in which the model has access to the full input signal while separating the signal. In this study, we convert a non-causal state-of-the-art separation model into a causal and real-time model and evaluate its performance under both online and offline settings. We compare the performance of the proposed model to several baseline methods under anechoic, noisy, and noisy-reverberant recording conditions while exploring both monaural and binaural inputs and outputs. Our findings shed light on the relative difference between causal and non-causal models when performing separation. Our stateful implementation for online separation leads to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
