On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Kai Li, Yi Luo

TL;DR
This paper explores methods to convert offline RNN-based neural speech separation systems into effective online systems, reducing performance gaps through layer reorganization and specialized training strategies.
Contribution
It introduces a novel layer decomposition and reorganization approach, along with training strategies, to improve online speech separation performance without retraining from scratch.
Findings
Layer decomposition effectively bridges performance gap
Training strategies enhance online model accuracy
Proposed methods outperform baseline online models
Abstract
While the performance of offline neural speech separation systems has been greatly advanced by the recent development of novel neural network architectures, there is typically an inevitable performance gap between the systems and their online variants. In this paper, we investigate how RNN-based offline neural speech separation systems can be changed into their online counterparts while mitigating the performance degradation. We decompose or reorganize the forward and backward RNN layers in a bidirectional RNN layer to form an online path and an offline path, which enables the model to perform both online and offline processing with a same set of model parameters. We further introduce two training strategies for improving the online model via either a pretrained offline model or a multitask training objective. Experiment results show that compared to the online models that are trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research
