Atss-Net: Target Speaker Separation via Attention-based Neural Network

Tingle Li; Qingjian Lin; Yuanyuan Bao; Ming Li

arXiv:2005.09200·eess.AS·May 20, 2020·5 cites

Atss-Net: Target Speaker Separation via Attention-based Neural Network

Tingle Li, Qingjian Lin, Yuanyuan Bao, Ming Li

PDF

Open Access

TL;DR

This paper introduces Atss-Net, an attention-based neural network for target speaker separation that outperforms existing models like VoiceFilter while using fewer parameters, and also shows promise in speech enhancement.

Contribution

The paper presents a novel attention-based neural network architecture for speaker separation that is more efficient and effective than previous CNN-LSTM models.

Findings

01

Atss-Net outperforms VoiceFilter in speaker separation tasks.

02

Atss-Net uses fewer parameters than comparable models.

03

The model shows promising results in speech enhancement.

Abstract

Recently, Convolutional Neural Network (CNN) and Long short-term memory (LSTM) based models have been introduced to deep learning-based target speaker separation. In this paper, we propose an Attention-based neural network (Atss-Net) in the spectrogram domain for the task. It allows the network to compute the correlation between each feature parallelly, and using shallower layers to extract more features, compared with the CNN-LSTM architecture. Experimental results show that our Atss-Net yields better performance than the VoiceFilter, although it only contains half of the parameters. Furthermore, our proposed model also demonstrates promising performance in speech enhancement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing