Guided Training: A Simple Method for Single-channel Speaker Separation
Hao Li, Xueliang Zhang, Guanglai Gao

TL;DR
This paper introduces a straightforward training method for single-channel speaker separation that uses a short guide speech at the beginning of the mixture to help the LSTM model identify and separate the target speaker.
Contribution
The paper proposes a novel guided training strategy that leverages a short target speech segment to address permutation issues in speaker separation with LSTM models.
Findings
Effective in improving speaker separation performance
Utilizes sequence modeling capabilities of LSTM
Simplifies training process for speaker separation
Abstract
Deep learning has shown a great potential for speech separation, especially for speech and non-speech separation. However, it encounters permutation problem for multi-speaker separation where both target and interference are speech. Permutation Invariant training (PIT) was proposed to solve this problem by permuting the order of the multiple speakers. Another way is to use an anchor speech, a short speech of the target speaker, to model the speaker identity. In this paper, we propose a simple strategy to train a long short-term memory (LSTM) model to solve the permutation problem in speaker separation. Specifically, we insert a short speech of target speaker at the beginning of a mixture as guide information. So, the first appearing speaker is defined as the target. Due to the powerful capability on sequence modeling, LSTM can use its memory cells to track and separate target speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
