A Real-time Speaker Diarization System Based on Spatial Spectrum

Siqi Zheng; Weilong Huang; Xianliang Wang; Hongbin Suo; Jinwei Feng,; Zhijie Yan

arXiv:2107.09321·cs.SD·July 21, 2021

A Real-time Speaker Diarization System Based on Spatial Spectrum

Siqi Zheng, Weilong Huang, Xianliang Wang, Hongbin Suo, Jinwei Feng,, Zhijie Yan

PDF

TL;DR

This paper presents a real-time speaker diarization system that combines spatial spectrum analysis with innovative clustering and detection methods to accurately identify, separate, and track speakers in dynamic, challenging environments.

Contribution

It introduces a novel spatial spectrum-based approach integrated with online clustering and speaker number detection for improved real-time speaker diarization.

Findings

01

Effective separation of overlapping speech.

02

Accurate real-time speaker tracking and identification.

03

Significant performance improvements over existing methods.

Abstract

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in speaker diarization tasks: (1) to segment and separate overlapping speech from two speakers; (2) to estimate the number of speakers when participants may enter or leave the conversation at any time; (3) to provide accurate speaker identification on short text-independent utterances; (4) to track down speakers movement during the conversation; (5) to detect speaker change incidence real-time. First, a differential directional microphone array-based approach is exploited to capture the target speakers' voice in far-field adverse environment. Second, an online speaker-location joint clustering approach is proposed to keep track of speaker location. Third,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.