Learning to Separate Voices by Spatial Regions

Zhongweiyang Xu; Romit Roy Choudhury

arXiv:2207.04203·cs.SD·July 18, 2022·1 cites

Learning to Separate Voices by Spatial Regions

Zhongweiyang Xu, Romit Roy Choudhury

PDF

Open Access

TL;DR

This paper introduces a self-supervised, region-based voice separation method for binaural audio, improving personalization and handling multiple sources without fixed source number assumptions.

Contribution

It proposes a novel two-stage self-supervised framework that learns spatial region properties for personalized voice separation, relaxing fixed source number constraints.

Findings

01

Region-wise separation improves handling multiple sources.

02

Personalized models outperform generic supervised models.

03

Promising results in real-world applications like noise cancellation.

Abstract

We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today's neural networks perform remarkably well (separating $4 +$ sources with 2 microphones) they assume a known or fixed maximum number of sources, K. Moreover, today's models are trained in a supervised manner, using training data synthesized from generic sources, environments, and human head shapes. This paper intends to relax both these constraints at the expense of a slight alteration in the problem definition. We observe that, when a received mixture contains too many sources, it is still helpful to separate them by region, i.e., isolating signal mixtures from each conical sector around the user's head. This requires learning the fine-grained spatial properties of each region, including the signal distortions imposed by a person's head. We propose a two-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing