Graph Neural Network Backend for Speaker Recognition
Liang He, Ruida Li, and Mengqi Niu

TL;DR
This paper introduces a graph neural network backend for speaker recognition that leverages local relationships among embeddings, significantly improving recognition accuracy over traditional similarity-based methods.
Contribution
It proposes a novel GNN-based backend that models embeddings as nodes in a graph, capturing latent relationships to enhance speaker recognition performance.
Findings
GNN backend outperforms traditional methods on multiple datasets.
Different graph settings and GNN variants improve recognition accuracy.
Experimental results demonstrate significant performance gains.
Abstract
Currently, most speaker recognition backends, such as cosine, linear discriminant analysis (LDA), or probabilistic linear discriminant analysis (PLDA), make decisions by calculating similarity or distance between enrollment and test embeddings which are already extracted from neural networks. However, for each embedding, the local structure of itself and its neighbor embeddings in the low-dimensional space is different, which may be helpful for the recognition but is often ignored. In order to take advantage of it, we propose a graph neural network (GNN) backend to mine latent relationships among embeddings for classification. We assume all the embeddings as nodes on a graph, and their edges are computed based on some similarity function, such as cosine, LDA+cosine, or LDA+PLDA. We study different graph settings and explore variants of GNN to find a better message passing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Text and Document Classification Technologies · Topic Modeling
MethodsGraph Neural Network
