Graph Attention Networks for Speaker Verification
Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung

TL;DR
This paper introduces a graph attention network-based framework for speaker verification that models segment embeddings as graph nodes, achieving significant accuracy improvements over traditional methods.
Contribution
The novel use of graph attention networks to interpret segment-wise speaker embeddings as graphs for improved speaker verification accuracy.
Findings
Achieved an average 20% reduction in equal error rate over cosine similarity baseline.
Validated effectiveness across three different speaker embedding extractors.
Demonstrated consistent performance improvements with the proposed framework.
Abstract
This work presents a novel back-end framework for speaker verification using graph attention networks. Segment-wise speaker embeddings extracted from multiple crops within an utterance are interpreted as node representations of a graph. The proposed framework inputs segment-wise speaker embeddings from an enrollment and a test utterance and directly outputs a similarity score. We first construct a graph using segment-wise speaker embeddings and then input these to graph attention networks. After a few graph attention layers with residual connections, each node is projected into a one-dimensional space using affine transform, followed by a readout operation resulting in a scalar similarity score. To enable successful adaptation for speaker verification, we propose techniques such as separating trainable weights for attention map calculations between segment-wise speaker embeddings from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
