Exploiting spatial information with the informed complex-valued spatial   autoencoder for target speaker extraction

Annika Briegleb; Mhd Modar Halimeh; Walter Kellermann

arXiv:2210.15512·eess.AS·June 13, 2023

Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction

Annika Briegleb, Mhd Modar Halimeh, Walter Kellermann

PDF

Open Access

TL;DR

This paper enhances neural spatial filtering for target speaker extraction by extending COSPA with spatial awareness, leading to more effective and interpretable separation of target speakers in multichannel audio.

Contribution

The paper introduces iCOSPA, an informed complex-valued spatial autoencoder that incorporates target speaker position, improving spatial selectivity and extraction performance.

Findings

01

iCOSPA effectively extracts target speakers from mixtures.

02

The architecture learns pronounced spatial selectivity patterns.

03

Performance depends on training target and reference signal.

Abstract

In conventional multichannel audio signal enhancement, spatial and spectral filtering are often performed sequentially. In contrast, it has been shown that for neural spatial filtering a joint approach of spectro-spatial filtering is more beneficial. In this contribution, we investigate the spatial filtering performed by such a time-varying spectro-spatial filter. We extend the recently proposed complex-valued spatial autoencoder (COSPA) for the task of target speaker extraction by leveraging its interpretable structure and purposefully informing the network of the target speaker's position. We show that the resulting informed COSPA (iCOSPA) effectively and flexibly extracts a target speaker from a mixture of speakers. We also find that the proposed architecture is well capable of learning pronounced spatial selectivity patterns and show that the results depend significantly on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Underwater Acoustics Research · Speech Recognition and Synthesis