A Real-time Speaker Diarization System Based on Spatial Spectrum
Siqi Zheng, Weilong Huang, Xianliang Wang, Hongbin Suo, Jinwei Feng,, Zhijie Yan

TL;DR
This paper presents a real-time speaker diarization system that combines spatial spectrum analysis with innovative clustering and detection methods to accurately identify, separate, and track speakers in dynamic, challenging environments.
Contribution
It introduces a novel spatial spectrum-based approach integrated with online clustering and speaker number detection for improved real-time speaker diarization.
Findings
Effective separation of overlapping speech.
Accurate real-time speaker tracking and identification.
Significant performance improvements over existing methods.
Abstract
In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in speaker diarization tasks: (1) to segment and separate overlapping speech from two speakers; (2) to estimate the number of speakers when participants may enter or leave the conversation at any time; (3) to provide accurate speaker identification on short text-independent utterances; (4) to track down speakers movement during the conversation; (5) to detect speaker change incidence real-time. First, a differential directional microphone array-based approach is exploited to capture the target speakers' voice in far-field adverse environment. Second, an online speaker-location joint clustering approach is proposed to keep track of speaker location. Third,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
