Speaker attribution with voice profiles by graph-based semi-supervised   learning

Jixuan Wang; Xiong Xiao; Jian Wu; Ranjani Ramamurthy; Frank Rudzicz,; Michael Brudno

arXiv:2102.03634·eess.AS·February 9, 2021

Speaker attribution with voice profiles by graph-based semi-supervised learning

Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz,, Michael Brudno

PDF

TL;DR

This paper introduces a graph-based semi-supervised learning approach for speaker attribution in meetings, leveraging speaker embeddings and graph structures to significantly improve accuracy over traditional methods.

Contribution

It proposes a novel application of graph neural networks and label propagation for speaker attribution, utilizing structural information from speech segment graphs.

Findings

01

Reduced speaker attribution error by up to 68%

02

Effective utilization of speaker embeddings and graph structure

03

Improved performance over baseline methods

Abstract

Speaker attribution is required in many real-world applications, such as meeting transcription, where speaker identity is assigned to each utterance according to speaker voice profiles. In this paper, we propose to solve the speaker attribution problem by using graph-based semi-supervised learning methods. A graph of speech segments is built for each session, on which segments from voice profiles are represented by labeled nodes while segments from test utterances are unlabeled nodes. The weight of edges between nodes is evaluated by the similarities between the pretrained speaker embeddings of speech segments. Speaker attribution then becomes a semi-supervised learning problem on graphs, on which two graph-based methods are applied: label propagation (LP) and graph neural networks (GNNs). The proposed approaches are able to utilize the structural information of the graph to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.