The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
Zirui Ge, Haiyan Guo, Zhen Yang

TL;DR
This paper introduces a novel graph neural network-based feature fusion method for speaker recognition using wav2vec2.0, outperforming traditional pooling techniques by capturing inter-feature relationships.
Contribution
It proposes a GNN-based approach to fuse wav2vec2.0 features, providing a theoretical proof of its superiority over classical pooling methods.
Findings
GNN fusion outperforms mean, max, and random pooling methods.
The approach shows relative performance improvements in speaker recognition tasks.
The method effectively captures feature relationships beyond temporal information.
Abstract
Pre-trained wav2vec2.0 model has been proved its effectiveness for speaker recognition. However, current feature processing methods are focusing on classical pooling on the output features of the pre-trained wav2vec2.0 model, such as mean pooling, max pooling etc. That methods take the features as the independent and irrelevant units, ignoring the inter-relationship among all the features, and do not take the features as an overall representation of a speaker. Gated Recurrent Unit (GRU), as a feature fusion method, can also be considered as a complicated pooling technique, mainly focuses on the temporal information, which may show poor performance in some situations that the main information is not on the temporal dimension. In this paper, we investigate the graph neural network (GNN) as a backend processing module based on wav2vec2.0 framework to provide a solution for the mentioned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Text and Document Classification Technologies
MethodsGraph Neural Network · Max Pooling
