SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

Eduardo Pacheco; Atila Orhon; Berkin Durmus; Blaise Munyampirwa; Andrey Leonov

arXiv:2507.16136·cs.SD·August 7, 2025

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization

Eduardo Pacheco, Atila Orhon, Berkin Durmus, Blaise Munyampirwa, Andrey Leonov

PDF

Open Access

TL;DR

SDBench is an open-source benchmark suite that standardizes evaluation of speaker diarization systems across diverse datasets, enabling reproducible comparisons and facilitating development of faster, accurate solutions like SpeakerKit.

Contribution

It introduces SDBench, a comprehensive, easy-to-use benchmark suite for speaker diarization, and demonstrates its utility through the development of a faster, comparable-performance system called SpeakerKit.

Findings

01

SpeakerKit is 9.6x faster than Pyannote v3 with similar error rates.

02

Benchmarking reveals trade-offs between accuracy and speed among top systems.

03

SDBench enables reproducible, fine-grained analysis across multiple datasets.

Abstract

Even state-of-the-art speaker diarization systems exhibit high variance in error rates across different datasets, representing numerous use cases and domains. Furthermore, comparing across systems requires careful application of best practices such as dataset splits and metric definitions to allow for apples-to-apples comparison. We propose SDBench (Speaker Diarization Benchmark), an open-source benchmark suite that integrates 13 diverse datasets with built-in tooling for consistent and fine-grained analysis of speaker diarization performance for various on-device and server-side systems. SDBench enables reproducible evaluation and easy integration of new systems over time. To demonstrate the efficacy of SDBench, we built SpeakerKit, an inference efficiency-focused system built on top of Pyannote v3. SDBench enabled rapid execution of ablation studies that led to SpeakerKit being 9.6x…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis