Controlling the Perceived Sound Quality for Dialogue Enhancement with   Deep Learning

Christian Uhle; Matteo Torcoli; Jouni Paulus

arXiv:2107.10562·eess.AS·July 23, 2021

Controlling the Perceived Sound Quality for Dialogue Enhancement with Deep Learning

Christian Uhle, Matteo Torcoli, Jouni Paulus

PDF

TL;DR

This paper introduces a deep learning-based method to control the balance between background noise reduction and sound quality in dialogue enhancement, ensuring consistent audio quality according to user preferences.

Contribution

It presents a novel neural network approach that dynamically adjusts noise attenuation to maintain a target sound quality level, addressing artifacts in speech enhancement.

Findings

01

Achieves accurate control of sound quality in real-world scenarios

02

Subjective evaluations confirm consistent perceived sound quality

03

Effective trade-off management between noise reduction and audio fidelity

Abstract

Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, quantified using the Artifact-related Perceptual Score, meets an adjustable target. Subjective evaluations indicate that consistent sound quality is obtained across various input signals. Our experiments show that the proposed method is able to control the trade-off with an accuracy that is adequate for real-world dialogue enhancement applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.