Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features
Yicheng Hsu, Yonghan Lee, Mingsian R. Bai

TL;DR
This paper introduces a learning-based speech enhancement system for teleconferencing that leverages spatial-spectral features and speaker embeddings, demonstrating improved robustness and performance over baseline methods.
Contribution
The study proposes a novel target speech extraction system using spatial coherence features and a robust learning-based network, outperforming existing methods in diverse microphone configurations.
Findings
Superior enhancement performance compared to baseline
Robustness to microphone array geometries
Effective in real-world teleconferencing scenarios
Abstract
Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target speech extraction from the mixture signals can be performed with the aid of the user's vocal features. Various features are accounted for in this study's proposed system, including speaker embeddings derived from user enrollment and a novel long-short-term spatial coherence feature pertaining to the target speaker activity. As a learning-based approach, a target speech sifting network was employed to extract the relevant features. The network trained with LSTSC in the proposed approach is robust to microphone array geometries and the number of microphones. Furthermore, the proposed enhancement system was compared with a baseline system with speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
