Ultra Fast Speech Separation Model with Teacher Student Learning
Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu,, Jinyu Li, Xiangzhan Yu

TL;DR
This paper introduces an ultra fast speech separation Transformer model that uses teacher-student learning to improve performance and efficiency, reducing word error rates significantly on LibriCSS dataset.
Contribution
It proposes a novel layer-wise teacher-student learning framework with objective shifting to enhance small Transformer models for speech separation.
Findings
Reduces WER by over 5% compared to training from scratch.
Achieves more than 10% relative WER reduction with additional unlabeled data.
Demonstrates improved efficiency and performance on LibriCSS dataset.
Abstract
Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder layers, which hinders its deployment on edge devices. A small Transformer model with fewer encoder layers is preferred for computational efficiency, but it is prone to performance degradation. In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning). We introduce layer-wise T-S learning and objective shifting mechanisms to guide the small student model to learn intermediate representations from the large teacher model. Compared with the small Transformer model trained from scratch, the proposed T-S learning method reduces the word error rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Softmax
