A Toolkit for Joint Speaker Diarization and Identification with   Application to Speaker-Attributed ASR

Giovanni Morrone; Enrico Zovato; Fabio Brugnara; Enrico Sartori,; Leonardo Badino

arXiv:2409.05750·eess.AS·September 10, 2024

A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR

Giovanni Morrone, Enrico Zovato, Fabio Brugnara, Enrico Sartori,, Leonardo Badino

PDF

Open Access

TL;DR

This paper introduces a flexible, modular toolkit for joint speaker diarization and identification that integrates with speech recognition systems to produce speaker-attributed transcriptions across diverse conditions and applications.

Contribution

The paper presents a configurable toolkit capable of combining multiple models for joint speaker diarization and identification, adaptable to various scenarios and integrated with ASR systems.

Findings

01

Effective in diverse acoustic and language conditions

02

Supports multiple registered speaker sets

03

Generates speaker-attributed transcriptions

Abstract

We present a modular toolkit to perform joint speaker diarization and speaker identification. The toolkit can leverage on multiple models and algorithms which are defined in a configuration file. Such flexibility allows our system to work properly in various conditions (e.g., multiple registered speakers' sets, acoustic conditions and languages) and across application domains (e.g. media monitoring, institutional, speech analytics). In this demonstration we show a practical use-case in which speaker-related information is used jointly with automatic speech recognition engines to generate speaker-attributed transcriptions. To achieve that, we employ a user-friendly web-based interface to process audio and video inputs with the chosen configuration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing