Spatial Aware Multi-Task Learning Based Speech Separation

Wei Sun; Mei Wang; Lili Qiu

arXiv:2207.10229·cs.SD·July 22, 2022

Spatial Aware Multi-Task Learning Based Speech Separation

Wei Sun, Mei Wang, Lili Qiu

PDF

Open Access

TL;DR

This paper introduces SAMS, a real-time, multi-task learning system that uses spatial awareness to improve speech separation during teleconferencing, enhancing privacy and audio quality.

Contribution

The paper presents a novel spatial-aware multi-task learning framework with fine-grained location embeddings for effective speech separation in teleconferencing.

Findings

01

Effective separation of target speech in noisy environments

02

Real-time inference speedup achieved

03

Improved privacy and audio clarity during calls

Abstract

During the Covid, online meetings have become an indispensable part of our lives. This trend is likely to continue due to their convenience and broad reach. However, background noise from other family members, roommates, office-mates not only degrades the voice quality but also raises serious privacy issues. In this paper, we develop a novel system, called Spatial Aware Multi-task learning-based Separation (SAMS), to extract audio signals from the target user during teleconferencing. Our solution consists of three novel components: (i) generating fine-grained location embeddings from the user's voice and inaudible tracking sound, which contains the user's position and rich multipath information, (ii) developing a source separation neural network using multi-task learning to jointly optimize source separation and location, and (iii) significantly speeding up inference to provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques