pyannote.audio: neural building blocks for speaker diarization

Herv\'e Bredin; Ruiqing Yin; Juan Manuel Coria; Gregory Gelly; Pavel; Korshunov; Marvin Lavechin; Diego Fustes; Hadrien Titeux; Wassim Bouaziz,; Marie-Philippe Gill

arXiv:1911.01255·eess.AS·November 5, 2019

pyannote.audio: neural building blocks for speaker diarization

Herv\'e Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel, Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz,, Marie-Philippe Gill

PDF

3 Repos

TL;DR

pyannote.audio is an open-source Python toolkit that offers trainable neural components for speaker diarization, enabling flexible pipeline construction and achieving state-of-the-art results across various speech processing tasks.

Contribution

It introduces a modular, trainable neural toolkit with pre-trained models for multiple speaker diarization tasks, facilitating research and deployment.

Findings

01

Achieves state-of-the-art performance in voice activity detection.

02

Provides versatile neural building blocks for speaker diarization.

03

Includes pre-trained models covering diverse domains.

Abstract

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.