DNN-based Source Enhancement to Increase Objective Sound Quality Assessment Score
Yuma Koizumi, Kenta Niwa, Yusuke Hioka, Kazunori Kobayashi, Yoichi, Haneda

TL;DR
This paper introduces a novel DNN training method that directly optimizes objective sound quality scores, like PESQ, using black-box optimization techniques to enhance perceived audio quality.
Contribution
It proposes a black-box optimization-based training scheme for DNNs that directly maximizes OSQA scores, overcoming the limitations of traditional MSE-based training.
Findings
OSQA scores significantly increased with the proposed method
The method improves sound quality without minimizing MSE
DNNs can be trained to directly optimize perceptual quality metrics
Abstract
We propose a training method for deep neural network (DNN)-based source enhancement to increase objective sound quality assessment (OSQA) scores such as the perceptual evaluation of speech quality (PESQ). In many conventional studies, DNNs have been used as a mapping function to estimate time-frequency masks and trained to minimize an analytically tractable objective function such as the mean squared error (MSE). Since OSQA scores have been used widely for sound-quality evaluation, constructing DNNs to increase OSQA scores would be better than using the minimum-MSE to create high-quality output signals. However, since most OSQA scores are not analytically tractable, \textit{i.e.}, they are black boxes, the gradient of the objective function cannot be calculated by simply applying back-propagation. To calculate the gradient of the OSQA-based objective function, we formulated a DNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
