Array Configuration-Agnostic Personalized Speech Enhancement using   Long-Short-Term Spatial Coherence

Yicheng Hsu; Yonghan Lee; Mingsian R. Bai

arXiv:2211.08748·eess.AS·November 17, 2022

Array Configuration-Agnostic Personalized Speech Enhancement using Long-Short-Term Spatial Coherence

Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

PDF

Open Access

TL;DR

This paper introduces an array configuration-agnostic multichannel personalized speech enhancement system that leverages a novel spatial feature, enabling effective speech suppression across diverse microphone array setups in challenging acoustic environments.

Contribution

The paper proposes a new spatial feature called long short term spatial coherence (LSTSC) for array-agnostic speech enhancement using convolutional recurrent networks, reducing dependency on specific array configurations.

Findings

01

The proposed system outperforms baselines in unseen room and array configurations.

02

LSTSC feature reduces computational cost while maintaining performance.

03

The system effectively suppresses TV noise and competing speakers in various scenarios.

Abstract

Personalized speech enhancement has been a field of active research for suppression of speechlike interferers such as competing speakers or TV dialogues. Compared with single channel approaches, multichannel PSE systems can be more effective in adverse acoustic conditions by leveraging the spatial information in microphone signals. However, the implementation of multichannel PSEs to accommodate a wide range of array topology in household applications can be challenging. To develop an array configuration agnostic PSE system, we define a spatial feature termed the long short term spatial coherence as the input feature to a convolutional recurrent network to monitor the voice activity of the target speaker. As another refinement, an equivalent rectangular bandwidth scaled LSTSC feature can be used to reduce the computational cost. Experiments were conducted to compare the proposed PSE…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques