# LSTM based Similarity Measurement with Spectral Clustering for Speaker   Diarization

**Authors:** Qingjian Lin, Ruiqing Yin, Ming Li, Herv\'e Bredin, Claude Barras

arXiv: 1907.10393 · 2019-12-02

## TL;DR

This paper introduces a supervised LSTM-based similarity measurement combined with spectral clustering to enhance speaker diarization, significantly reducing error rates compared to existing methods.

## Contribution

It presents a novel supervised Bi-LSTM approach for similarity measurement in speaker diarization, integrated with spectral clustering for improved accuracy.

## Key findings

- Achieved a diarization error rate of 6.63% on NIST SRE 2000 CALLHOME.
- Outperformed state-of-the-art methods in speaker diarization tasks.
- Demonstrated the effectiveness of supervised LSTM in similarity scoring.

## Abstract

More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage, traditional algorithms like probabilistic linear discriminant analysis (PLDA) are widely used for scoring the similarity between two speech segments. In this paper, we propose a supervised method to measure the similarity matrix between all segments of an audio recording with sequential bidirectional long short-term memory networks (Bi-LSTM). Spectral clustering is applied on top of the similarity matrix to further improve the performance. Experimental results show that our system significantly outperforms the state-of-the-art methods and achieves a diarization error rate of 6.63% on the NIST SRE 2000 CALLHOME database.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.10393/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1907.10393/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1907.10393/full.md

---
Source: https://tomesphere.com/paper/1907.10393