Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method
Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi, Wen, Xuefei Liu

TL;DR
This paper introduces an end-to-end post-filter with deep attention fusion features that significantly improves monaural speech separation performance by effectively utilizing prior separation information and deep attention mechanisms.
Contribution
It proposes a novel deep attention fusion feature-based post-filter that enhances pre-separated speech in an end-to-end framework, outperforming existing methods on standard datasets.
Findings
Achieved 64.1% relative improvement in SI-SNR
Improved SDR, PESQ, and STOI metrics significantly
Outperformed state-of-the-art speech separation methods
Abstract
In this paper, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. At first, a time-frequency domain speech separation method is applied as the pre-separation stage. The aim of pre-separation stage is to separate the mixture preliminarily. Although this stage can separate the mixture, it still contains the residual interference. In order to enhance the pre-separated speech and improve the separation performance further, the end-to-end post-filter (E2EPF) with deep attention fusion features is proposed. The E2EPF can make full use of the prior knowledge of the pre-separated speech, which contributes to speech separation. It is a fully convolutional speech separation network and uses the waveform as the input features. Firstly, the 1-D convolutional layer is utilized to extract the deep representation features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
