Linguistically Aided Speaker Diarization Using Speaker Role Information
Nikolaos Flemotomos, Panayiotis Georgiou, Shrikanth Narayanan

TL;DR
This paper introduces a linguistically aided speaker diarization method that leverages speaker role information to improve robustness and accuracy in conversational scenarios, especially under noisy conditions.
Contribution
It proposes a novel approach that incorporates linguistic features and speaker roles to transform clustering into classification, enhancing diarization performance.
Findings
Improved diarization accuracy in psychotherapy interactions.
Robustness against noisy audio conditions.
Effective use of linguistic and role-based features.
Abstract
Speaker diarization relies on the assumption that speech segments corresponding to a particular speaker are concentrated in a specific region of the speaker space; a region which represents that speaker's identity. These identities are not known a priori, so a clustering algorithm is typically employed, which is traditionally based solely on audio. Under noisy conditions, however, such an approach poses the risk of generating unreliable speaker clusters. In this work we aim to utilize linguistic information as a supplemental modality to identify the various speakers in a more robust way. We are focused on conversational scenarios where the speakers assume distinct roles and are expected to follow different linguistic patterns. This distinct linguistic variability can be exploited to help us construct the speaker identities. That way, we are able to boost the diarization performance by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
