Ultra Fast Speech Separation Model with Teacher Student Learning

Sanyuan Chen; Yu Wu; Zhuo Chen; Jian Wu; Takuya Yoshioka; Shujie Liu,; Jinyu Li; Xiangzhan Yu

arXiv:2204.12777·eess.AS·April 28, 2022

Ultra Fast Speech Separation Model with Teacher Student Learning

Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu,, Jinyu Li, Xiangzhan Yu

PDF

TL;DR

This paper introduces an ultra fast speech separation Transformer model that uses teacher-student learning to improve performance and efficiency, reducing word error rates significantly on LibriCSS dataset.

Contribution

It proposes a novel layer-wise teacher-student learning framework with objective shifting to enhance small Transformer models for speech separation.

Findings

01

Reduces WER by over 5% compared to training from scratch.

02

Achieves more than 10% relative WER reduction with additional unlabeled data.

03

Demonstrates improved efficiency and performance on LibriCSS dataset.

Abstract

Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder layers, which hinders its deployment on edge devices. A small Transformer model with fewer encoder layers is preferred for computational efficiency, but it is prone to performance degradation. In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning). We introduce layer-wise T-S learning and objective shifting mechanisms to guide the small student model to learn intermediate representations from the large teacher model. Compared with the small Transformer model trained from scratch, the proposed T-S learning method reduces the word error rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Dropout · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Softmax