Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a   Single-Channel Model

Arthur N. dos Santos; Bruno S. Masiero; T\'ulio C. L. Mateus

arXiv:2404.14564·eess.AS·April 24, 2024

Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a Single-Channel Model

Arthur N. dos Santos, Bruno S. Masiero, T\'ulio C. L. Mateus

PDF

Open Access

TL;DR

This paper investigates whether single-channel speech enhancement models can be effectively adapted for multi-channel scenarios by processing each channel independently, aiming to simplify system design and resource requirements.

Contribution

It experimentally compares single-channel and multi-channel enhancement models, demonstrating the viability of single-channel methods for multi-channel applications with some trade-offs.

Findings

01

Single-channel models can be adapted for multi-channel enhancement.

02

Multi-channel models better preserve spatial information.

03

Trade-off exists between spatial preservation and intelligibility gains.

Abstract

One key aspect differentiating data-driven single- and multi-channel speech enhancement and dereverberation methods is that both the problem formulation and complexity of the solutions are considerably more challenging in the latter case. Additionally, with limited computational resources, it is cumbersome to train models that require the management of larger datasets or those with more complex designs. In this scenario, an unverified hypothesis that single-channel methods can be adapted to multi-channel scenarios simply by processing each channel independently holds significant implications, boosting compatibility between sound scene capture and system input-output formats, while also allowing modern research to focus on other challenging aspects, such as full-bandwidth audio enhancement, competitive noise suppression, and unsupervised learning. This study verifies this hypothesis by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Noise Effects and Management · Music and Audio Processing

MethodsFocus