Learning-based personal speech enhancement for teleconferencing by   exploiting spatial-spectral features

Yicheng Hsu; Yonghan Lee; Mingsian R. Bai

arXiv:2112.05686·eess.AS·May 2, 2022·1 cites

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

PDF

Open Access

TL;DR

This paper introduces a learning-based speech enhancement system for teleconferencing that leverages spatial-spectral features and speaker embeddings, demonstrating improved robustness and performance over baseline methods.

Contribution

The study proposes a novel target speech extraction system using spatial coherence features and a robust learning-based network, outperforming existing methods in diverse microphone configurations.

Findings

01

Superior enhancement performance compared to baseline

02

Robustness to microphone array geometries

03

Effective in real-world teleconferencing scenarios

Abstract

Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extraction from the mixture signals can be performed with the aid of the user's vocal features. Various features are accounted for in this study's proposed system, including speaker embeddings derived from user enrollment and a novel long-short-term spatial coherence feature pertaining to the target speaker activity. As a learning-based approach, a target speech sifting network was employed to extract the relevant features. The network trained with LSTSC in the proposed approach is robust to microphone array geometries and the number of microphones. Furthermore, the proposed enhancement system was compared with a baseline system with speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing