Controlling the Remixing of Separated Dialogue with a Non-Intrusive   Quality Estimate

Matteo Torcoli; Jouni Paulus; Thorsten Kastner; Christian Uhle

arXiv:2107.10151·eess.AS·March 24, 2023

Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate

Matteo Torcoli, Jouni Paulus, Thorsten Kastner, Christian Uhle

PDF

Open Access

TL;DR

This paper introduces a non-intrusive audio quality estimation method using deep neural networks to control the remixing of separated dialogue, balancing interferer attenuation and audio quality in a signal-adaptive way.

Contribution

It proposes a novel non-intrusive quality estimation approach based on the 2f-model and DNNs, enabling effective control of audio remixing without needing reference signals.

Findings

01

iDNN2f correlates strongly with the original measure (r=0.99)

02

Non-intrusive estimates achieve high correlation (r>=0.91)

03

Listening tests confirm successful quality control with significant gain differences

Abstract

Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source separation. An alternative operation mode of the measure is proposed, more appropriate when considering material with long inactive periods of the target source. The 2f-model requires the reference target source as an input, but this is not available in many applications. Deep neural networks (DNNs) are trained to estimate the 2f-model intrusively using the reference target (iDNN2f), non-intrusively using the input mix as reference (nDNN2f), and reference-free using only the separated output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Team Dynamics and Performance