DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for   Dialog Enhancement

Aaron Master; Lie Lu; Jonas Samuelsson; Heidi-Maria Lehtonen; Scott; Norcross; Nathan Swedlow; and Audrey Howard

arXiv:2302.08202·eess.AS·February 23, 2023

DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement

Aaron Master, Lie Lu, Jonas Samuelsson, Heidi-Maria Lehtonen, Scott, Norcross, Nathan Swedlow, and Audrey Howard

PDF

Open Access

TL;DR

DeepSpace is a novel source separation system that enhances dialog in TV and movie content by leveraging dynamic spatial cues and deep learning, significantly outperforming existing methods in subjective listening tests.

Contribution

It introduces a new approach combining spatio-level filtering and deep learning for unguided dialog enhancement, improving separation quality.

Findings

01

DeepSpace outperforms state-of-the-art systems in subjective tests.

02

The system effectively utilizes dynamic spatial cues for source separation.

03

Automated metrics show promise for evaluating unguided dialog enhancement.

Abstract

Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technologies include spatio-level filtering (SLF) and deep-learning based dialog classification and denoising. Using subjective listening tests, we show that DeepSpace demonstrates significantly improved overall performance relative to state-of-the-art systems available for testing. We explore the feasibility of using existing automated metrics to evaluate unguided DE systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Subtitles and Audiovisual Media