Efficient Area-based and Speaker-Agnostic Source Separation

Martin Strauss; Okan K\"op\"ukl\"u

arXiv:2408.09810·eess.AS·August 20, 2024

Efficient Area-based and Speaker-Agnostic Source Separation

Martin Strauss, Okan K\"op\"ukl\"u

PDF

Open Access

TL;DR

This paper presents a real-time, low-complexity neural network method for area-based, speaker-agnostic source separation in virtual meetings, effectively isolating speech within a defined spatial region.

Contribution

The paper introduces a novel neural network architecture tailored for multi-channel input to perform area-based source separation without prior speaker information.

Findings

01

Effective separation of multiple speakers within target area

02

Low computational complexity suitable for real-time processing

03

Demonstrated ability to identify sources within the spatial target area

Abstract

This paper introduces an area-based source separation method designed for virtual meeting scenarios. The aim is to preserve speech signals from an unspecified number of sources within a defined spatial area in front of a linear microphone array, while suppressing all other sounds. Therefore, we employ an efficient neural network architecture adapted for multi-channel input to encompass the predefined target area. To evaluate the approach, training data and specific test scenarios including multiple target and interfering speakers, as well as background noise are simulated. All models are rated according to DNSMOS and scale-invariant signal-to-distortion ratio. Our experiments show that the proposed method separates speech from multiple speakers within the target area well, besides being of very low complexity, intended for real-time processing. In addition, a power reduction heatmap is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques

MethodsHeatmap