DNN-based Source Enhancement to Increase Objective Sound Quality   Assessment Score

Yuma Koizumi; Kenta Niwa; Yusuke Hioka; Kazunori Kobayashi; Yoichi; Haneda

arXiv:1810.09137·stat.ML·October 23, 2018

DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score

Yuma Koizumi, Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Yoichi, Haneda

PDF

TL;DR

This paper introduces a novel DNN training method that directly optimizes objective sound quality scores, like PESQ, using black-box optimization techniques to enhance perceived audio quality.

Contribution

It proposes a black-box optimization-based training scheme for DNNs that directly maximizes OSQA scores, overcoming the limitations of traditional MSE-based training.

Findings

01

OSQA scores significantly increased with the proposed method

02

The method improves sound quality without minimizing MSE

03

DNNs can be trained to directly optimize perceptual quality metrics

Abstract

We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for sound-quality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create high-quality output signals. However, since most OSQA scores are not analytically tractable, \textit{i.e.}, they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.