One model to enhance them all: array geometry agnostic multi-channel   personalized speech enhancement

Hassan Taherian; Sefik Emre Eskimez; Takuya Yoshioka; Huaming Wang,; Zhuo Chen; Xuedong Huang

arXiv:2110.10330·eess.AS·October 22, 2021

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang,, Zhuo Chen, Xuedong Huang

PDF

Open Access

TL;DR

This paper introduces a novel array-geometry-agnostic multi-channel personalized speech enhancement model that leverages spatial information to improve speech quality and recognition accuracy across various microphone array configurations.

Contribution

The paper proposes a new causal multi-channel PSE model that is independent of microphone array geometry, enhancing performance and generalization to unseen array configurations.

Findings

01

Outperforms geometry-specific models in speech quality

02

Improves automatic speech recognition accuracy

03

Effective on unseen microphone array geometries

Abstract

With the recent surge of video conferencing tools usage, providing high-quality speech signals and accurate captions have become essential to conduct day-to-day business or connect with friends and families. Single-channel personalized speech enhancement (PSE) methods show promising results compared with the unconditional speech enhancement (SE) methods in these scenarios due to their ability to remove interfering speech in addition to the environmental noise. In this work, we leverage spatial information afforded by microphone arrays to improve such systems' performance further. We investigate the relative importance of speaker embeddings and spatial features. Moreover, we propose a new causal array-geometry-agnostic multi-channel PSE model, which can generate a high-quality enhanced signal from arbitrary microphone geometry. Experimental results show that the proposed geometry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques

MethodsOnline Normalization