Spatial Aware Multi-Task Learning Based Speech Separation
Wei Sun, Mei Wang, Lili Qiu

TL;DR
This paper introduces SAMS, a real-time, multi-task learning system that uses spatial awareness to improve speech separation during teleconferencing, enhancing privacy and audio quality.
Contribution
The paper presents a novel spatial-aware multi-task learning framework with fine-grained location embeddings for effective speech separation in teleconferencing.
Findings
Effective separation of target speech in noisy environments
Real-time inference speedup achieved
Improved privacy and audio clarity during calls
Abstract
During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not only degrades the voice quality but also raises serious privacy issues. In this paper, we develop a novel system, called Spatial Aware Multi-task learning-based Separation (SAMS), to extract audio signals from the target user during teleconferencing. Our solution consists of three novel components: (i) generating fine-grained location embeddings from the user's voice and inaudible tracking sound, which contains the user's position and rich multipath information, (ii) developing a source separation neural network using multi-task learning to jointly optimize source separation and location, and (iii) significantly speeding up inference to provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
