BUT System Description for DIHARD Speech Diarization Challenge 2019

Federico Landini; Shuai Wang; Mireia Diez; Luk\'a\v{s} Burget; Pavel; Mat\v{e}jka; Kate\v{r}ina \v{Z}mol\'ikov\'a; Ladislav Mo\v{s}ner; Old\v{r}ich; Plchot; Ond\v{r}ej Novotn\'y; Hossein Zeinali; Johan Rohdin

arXiv:1910.08847·eess.AS·October 22, 2019·19 cites

BUT System Description for DIHARD Speech Diarization Challenge 2019

Federico Landini, Shuai Wang, Mireia Diez, Luk\'a\v{s} Burget, Pavel, Mat\v{e}jka, Kate\v{r}ina \v{Z}mol\'ikov\'a, Ladislav Mo\v{s}ner, Old\v{r}ich, Plchot, Ond\v{r}ej Novotn\'y, Hossein Zeinali, Johan Rohdin

PDF

Open Access 1 Repo

TL;DR

This paper details the BUT team's speech diarization systems for the DIHARD 2019 challenge, utilizing clustering and HMM techniques across multiple tracks to improve speaker segmentation accuracy.

Contribution

It introduces a multi-track diarization approach combining AHC, x-vectors, and Bayesian HMMs, tailored for different challenge tracks.

Findings

01

Effective clustering and HMM integration improved diarization performance.

02

Systems achieved competitive results in DIHARD 2019 challenge.

03

Multi-channel x-vector extraction enhanced speaker segmentation.

Abstract

This paper describes the systems developed by the BUT team for the four tracks of the second DIHARD speech diarization challenge. For tracks 1 and 2 the systems were based on performing agglomerative hierarchical clustering (AHC) over x-vectors, followed by the Bayesian Hidden Markov Model (HMM) with eigenvoice priors applied at x-vector level followed by the same approach applied at frame level. For tracks 3 and 4, the systems were based on performing AHC using x-vectors extracted on all channels.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BUTSpeechFIT/VBx
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques