Distortion-controlled Training for End-to-end Reverberant Speech   Separation with Auxiliary Autoencoding Loss

Yi Luo; Cong Han; Nima Mesgarani

arXiv:2011.07338·eess.AS·November 17, 2020·SLT·1 cites

Distortion-controlled Training for End-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss

Yi Luo, Cong Han, Nima Mesgarani

PDF

Open Access

TL;DR

This paper introduces a distortion-controlled training method with auxiliary autoencoding loss for end-to-end reverberant speech separation, improving separation quality and speech recognition accuracy by managing distortions caused by reverberations.

Contribution

It proposes a novel auxiliary autoencoding training approach (A2T) that controls distortions in reverberant speech separation, addressing the equal-valued contour problem and enhancing performance.

Findings

01

A2T effectively controls direct-path signal distortions.

02

Improved speech recognition accuracy with A2T.

03

Enhanced separation quality in reverberant environments.

Abstract

The performance of speech enhancement and separation systems in anechoic environments has been significantly advanced with the recent progress in end-to-end neural network architectures. However, the performance of such systems in reverberant environments is yet to be explored. A core problem in reverberant speech separation is about the training and evaluation metrics. Standard time-domain metrics may introduce unexpected distortions during training and fail to properly evaluate the separation performance due to the presence of the reverberations. In this paper, we first introduce the "equal-valued contour" problem in reverberant separation where multiple outputs can lead to the same performance measured by the common metrics. We then investigate how "better" outputs with lower target-specific distortions can be selected by auxiliary autoencoding training (A2T). A2T assumes that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Advanced Adaptive Filtering Techniques