Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural   Network Based Speech Enhancement

Yong Xu; Jun Du; Zhen Huang; Li-Rong Dai; Chin-Hui Lee

arXiv:1703.07172·cs.SD·March 22, 2017·32 cites

Multi-Objective Learning and Mask-Based Post-Processing for Deep Neural Network Based Speech Enhancement

Yong Xu, Jun Du, Zhen Huang, Li-Rong Dai, Chin-Hui Lee

PDF

Open Access

TL;DR

This paper introduces a multi-objective deep learning framework for speech enhancement that jointly learns primary and secondary targets, improving performance and enabling effective post-processing techniques.

Contribution

It presents a novel joint learning architecture for primary and secondary speech features, enhancing enhancement quality and enabling mask-based post-processing.

Findings

01

Joint LPS and MFCC learning improves speech enhancement.

02

IBM-based post-processing further enhances speech quality.

03

The framework outperforms traditional single-target methods.

Abstract

We propose a multi-objective framework to learn both secondary targets not directly related to the intended task of speech enhancement (SE) and the primary target of the clean log-power spectra (LPS) features to be used directly for constructing the enhanced speech signals. In deep neural network (DNN) based SE we introduce an auxiliary structure to learn secondary continuous features, such as mel-frequency cepstral coefficients (MFCCs), and categorical information, such as the ideal binary mask (IBM), and integrate it into the original DNN architecture for joint optimization of all the parameters. This joint estimation scheme imposes additional constraints not available in the direct prediction of LPS, and potentially improves the learning of the primary target. Furthermore, the learned secondary information as a byproduct can be used for other purposes, e.g., the IBM-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques