Linguistically Aided Speaker Diarization Using Speaker Role Information

Nikolaos Flemotomos; Panayiotis Georgiou; Shrikanth Narayanan

arXiv:1911.07994·eess.AS·February 9, 2021

Linguistically Aided Speaker Diarization Using Speaker Role Information

Nikolaos Flemotomos, Panayiotis Georgiou, Shrikanth Narayanan

PDF

TL;DR

This paper introduces a linguistically aided speaker diarization method that leverages speaker role information to improve robustness and accuracy in conversational scenarios, especially under noisy conditions.

Contribution

It proposes a novel approach that incorporates linguistic features and speaker roles to transform clustering into classification, enhancing diarization performance.

Findings

01

Improved diarization accuracy in psychotherapy interactions.

02

Robustness against noisy audio conditions.

03

Effective use of linguistic and role-based features.

Abstract

Speaker diarization relies on the assumption that speech segments corresponding to a particular speaker are concentrated in a specific region of the speaker space; a region which represents that speaker's identity. These identities are not known a priori, so a clustering algorithm is typically employed, which is traditionally based solely on audio. Under noisy conditions, however, such an approach poses the risk of generating unreliable speaker clusters. In this work we aim to utilize linguistic information as a supplemental modality to identify the various speakers in a more robust way. We are focused on conversational scenarios where the speakers assume distinct roles and are expected to follow different linguistic patterns. This distinct linguistic variability can be exploited to help us construct the speaker identities. That way, we are able to boost the diarization performance by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.