DNN Speaker Tracking with Embeddings

Carlos Rodrigo Castillo-Sanchez; Leibny Paola Garcia-Perera; Anabel; Martin-Gonzalez

arXiv:2007.10248·cs.SD·July 21, 2020

DNN Speaker Tracking with Embeddings

Carlos Rodrigo Castillo-Sanchez, Leibny Paola Garcia-Perera, Anabel, Martin-Gonzalez

PDF

Open Access

TL;DR

This paper introduces a novel embedding-based neural network method for online speaker tracking that significantly improves diarization accuracy over traditional PLDA-based systems, demonstrating robustness across datasets and conditions.

Contribution

The paper presents a new CNN-based speaker tracking approach that mimics PLDA classifiers, offering improved performance and robustness in multi-speaker scenarios.

Findings

01

17% DER improvement on DIHARD II dataset

02

Effective in overlapping and non-overlapping speech segments

03

Robust against non-target speaker interference

Abstract

In multi-speaker applications is common to have pre-computed models from enrolled speakers. Using these models to identify the instances in which these speakers intervene in a recording is the task of speaker tracking. In this paper, we propose a novel embedding-based speaker tracking method. Specifically, our design is based on a convolutional neural network that mimics a typical speaker verification PLDA (probabilistic linear discriminant analysis) classifier and finds the regions uttered by the target speakers in an online fashion. The system was studied from two different perspectives: diarization and tracking; results on both show a significant improvement over the PLDA baseline under the same experimental conditions. Two standard public datasets, CALLHOME and DIHARD II single channel, were modified to create two-speaker subsets with overlapping and non-overlapping regions. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing