TL;DR
pyannote.audio is an open-source Python toolkit that offers trainable neural components for speaker diarization, enabling flexible pipeline construction and achieving state-of-the-art results across various speech processing tasks.
Contribution
It introduces a modular, trainable neural toolkit with pre-trained models for multiple speaker diarization tasks, facilitating research and deployment.
Findings
Achieves state-of-the-art performance in voice activity detection.
Provides versatile neural building blocks for speaker diarization.
Includes pre-trained models covering diverse domains.
Abstract
We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -- reaching state-of-the-art performance for most of them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
