Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering

Tobias Cord-Landwehr; Tobias Gburrek; Marc Deegen; Reinhold Haeb-Umbach

arXiv:2506.16228·eess.AS·September 1, 2025

Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering

Tobias Cord-Landwehr, Tobias Gburrek, Marc Deegen, Reinhold Haeb-Umbach

PDF

1 Repo

TL;DR

This paper introduces a spatio-spectral diarization method combining TDOA-based segmentation and embedding clustering, which outperforms existing approaches in handling overlapping speech and speaker movement without prior microphone knowledge.

Contribution

It presents a novel diarization pipeline that does not require multi-channel training data or microphone placement knowledge, effective for both compact and distributed microphone setups.

Findings

01

Outperforms single-channel pyannote approach in overlapping speech scenarios

02

Handles speaker position changes accurately during diarization

03

Works effectively with both compact and distributed microphone arrays

Abstract

We propose a spatio-spectral, combined model-based and data-driven diarization pipeline consisting of TDOA-based segmentation followed by embedding-based clustering. The proposed system requires neither access to multi-channel training data nor prior knowledge about the number or placement of microphones. It works for both a compact microphone array and distributed microphones, with minor adjustments. Due to its superior handling of overlapping speech during segmentation, the proposed pipeline significantly outperforms the single-channel pyannote approach, both in a scenario with a compact microphone array and in a setup with distributed microphones. Additionally, we show that, unlike fully spatial diarization pipelines, the proposed system can correctly track speakers when they change positions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fgnt/spatiospectral_diarization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.