Stable Training of DNN for Speech Enhancement based on   Perceptually-Motivated Black-Box Cost Function

Masaki Kawanaka; Yuma Koizumi; Ryoichi Miyazaki; Kohei Yatabe

arXiv:2002.05879·eess.AS·February 17, 2020·1 cites

Stable Training of DNN for Speech Enhancement based on Perceptually-Motivated Black-Box Cost Function

Masaki Kawanaka, Yuma Koizumi, Ryoichi Miyazaki, Kohei Yatabe

PDF

Open Access

TL;DR

This paper introduces a stable training method for deep neural networks in speech enhancement by using reinforcement learning techniques to optimize perceptually-motivated black-box cost functions like PESQ, achieving state-of-the-art results.

Contribution

It proposes a novel stabilization technique for training DNNs with non-differentiable perceptual quality measures, improving sound quality in speech enhancement.

Findings

01

Successfully trained DNN to improve PESQ scores

02

Achieved state-of-the-art PESQ scores on public datasets

03

Produced better sound quality than traditional methods

Abstract

Improving subjective sound quality of enhanced signals is one of the most important missions in speech enhancement. For evaluating the subjective quality, several methods related to perceptually-motivated objective sound quality assessment (OSQA) have been proposed such as PESQ (perceptual evaluation of speech quality). However, direct use of such measures for training deep neural network (DNN) is not allowed in most cases because popular OSQAs are non-differentiable with respect to DNN parameters. Therefore, the previous study has proposed to approximate the score of OSQAs by an auxiliary DNN so that its gradient can be used for training the primary DNN. One problem with this approach is instability of the training caused by the approximation error of the score. To overcome this problem, we propose to use stabilization techniques borrowed from reinforcement learning. The experiments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing