Graph Neural Network Backend for Speaker Recognition

Liang He; Ruida Li; and Mengqi Niu

arXiv:2308.08767·eess.AS·August 21, 2023

Graph Neural Network Backend for Speaker Recognition

Liang He, Ruida Li, and Mengqi Niu

PDF

Open Access

TL;DR

This paper introduces a graph neural network backend for speaker recognition that leverages local relationships among embeddings, significantly improving recognition accuracy over traditional similarity-based methods.

Contribution

It proposes a novel GNN-based backend that models embeddings as nodes in a graph, capturing latent relationships to enhance speaker recognition performance.

Findings

01

GNN backend outperforms traditional methods on multiple datasets.

02

Different graph settings and GNN variants improve recognition accuracy.

03

Experimental results demonstrate significant performance gains.

Abstract

Currently, most speaker recognition backends, such as cosine, linear discriminant analysis (LDA), or probabilistic linear discriminant analysis (PLDA), make decisions by calculating similarity or distance between enrollment and test embeddings which are already extracted from neural networks. However, for each embedding, the local structure of itself and its neighbor embeddings in the low-dimensional space is different, which may be helpful for the recognition but is often ignored. In order to take advantage of it, we propose a graph neural network (GNN) backend to mine latent relationships among embeddings for classification. We assume all the embeddings as nodes on a graph, and their edges are computed based on some similarity function, such as cosine, LDA+cosine, or LDA+PLDA. We study different graph settings and explore variants of GNN to find a better message passing and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Text and Document Classification Technologies · Topic Modeling

MethodsGraph Neural Network