# Constrained speaker diarization of TV series based on visual patterns

**Authors:** Xavier Bost (LIA), Georges Linares (LIA)

arXiv: 1812.07209 · 2019-04-22

## TL;DR

This paper presents a novel two-step speaker diarization method for TV series that leverages visual patterns to improve accuracy in challenging acoustic environments, outperforming standard tools.

## Contribution

The paper introduces a visual pattern-based constraint in a two-step diarization process, enhancing speaker identification in TV series with complex acoustic conditions.

## Key findings

- Improved diarization accuracy over standard tools.
- Effective use of visual cues to constrain speaker clustering.
- Method applicable to complex fictional film audio environments.

## Abstract

Speaker diarization, usually denoted as the ''who spoke when'' task, turns out to be particularly challenging when applied to fictional films, where many characters talk in various acoustic conditions (background music, sound effects...). Despite this acoustic variability , such movies exhibit specific visual patterns in the dialogue scenes. In this paper, we introduce a two-step method to achieve speaker diarization in TV series: a speaker diarization is first performed locally in the scenes detected as dialogues; then, the hypothesized local speakers are merged in a second agglomerative clustering process, with the constraint that speakers locally hypothesized to be distinct must not be assigned to the same cluster. The performances of our approach are compared to those obtained by standard speaker diarization tools applied to the same data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07209/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07209/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1812.07209/full.md

---
Source: https://tomesphere.com/paper/1812.07209