Location-based training for multi-channel talker-independent speaker   separation

Hassan Taherian; Ke Tan; and DeLiang Wang

arXiv:2110.04289·eess.AS·October 11, 2021·1 cites

Location-based training for multi-channel talker-independent speaker separation

Hassan Taherian, Ke Tan, and DeLiang Wang

PDF

Open Access

TL;DR

This paper introduces location-based training (LBT), a novel method leveraging spatial information from microphone arrays to improve multi-channel speaker separation, outperforming permutation-invariant training (PIT) especially in complex scenarios.

Contribution

The study proposes a new location-based training approach that assigns speakers based on spatial locations, reducing complexity and enhancing separation performance over existing methods.

Findings

01

LBT outperforms PIT in separating two- and three-speaker mixtures.

02

Azimuth-based training is more effective than distance-based training.

03

Dynamic selection of training type further improves separation results.

Abstract

Permutation-invariant training (PIT) is a dominant approach for addressing the permutation ambiguity problem in talker-independent speaker separation. Leveraging spatial information afforded by microphone arrays, we propose a new training approach to resolving permutation ambiguities for multi-channel speaker separation. The proposed approach, named location-based training (LBT), assigns speakers on the basis of their spatial locations. This training strategy is easy to apply, and organizes speakers according to their positions in physical space. Specifically, this study investigates azimuth angles and source distances for location-based training. Evaluation results on separating two- and three-speaker mixtures show that azimuth-based training consistently outperforms PIT, and distance-based training further improves the separation performance when speaker azimuths are close.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing