Deep Attention Fusion Feature for Speech Separation with End-to-End   Post-filter Method

Cunhang Fan; Jianhua Tao; Bin Liu; Jiangyan Yi; Zhengqi; Wen; Xuefei Liu

arXiv:2003.07544·eess.AS·March 18, 2020·5 cites

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi, Wen, Xuefei Liu

PDF

Open Access

TL;DR

This paper introduces an end-to-end post-filter with deep attention fusion features that significantly improves monaural speech separation performance by effectively utilizing prior separation information and deep attention mechanisms.

Contribution

It proposes a novel deep attention fusion feature-based post-filter that enhances pre-separated speech in an end-to-end framework, outperforming existing methods on standard datasets.

Findings

01

Achieved 64.1% relative improvement in SI-SNR

02

Improved SDR, PESQ, and STOI metrics significantly

03

Outperformed state-of-the-art speech separation methods

Abstract

In this paper, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. At first, a time-frequency domain speech separation method is applied as the pre-separation stage. The aim of pre-separation stage is to separate the mixture preliminarily. Although this stage can separate the mixture, it still contains the residual interference. In order to enhance the pre-separated speech and improve the separation performance further, the end-to-end post-filter (E2EPF) with deep attention fusion features is proposed. The E2EPF can make full use of the prior knowledge of the pre-separated speech, which contributes to speech separation. It is a fully convolutional speech separation network and uses the waveform as the input features. Firstly, the 1-D convolutional layer is utilized to extract the deep representation features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing