Spatial-temporal Graph Based Multi-channel Speaker Verification With   Ad-hoc Microphone Arrays

Yijiang Chen; Chengdong Liang; and Xiao-Lei Zhang

arXiv:2307.01386·cs.SD·July 6, 2023·1 cites

Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays

Yijiang Chen, Chengdong Liang, and Xiao-Lei Zhang

PDF

Open Access

TL;DR

This paper introduces a spatial-temporal graph convolutional network for multi-channel speaker verification using ad-hoc microphone arrays, effectively reducing noise impact and improving accuracy in challenging acoustic environments.

Contribution

It presents a novel graph-based framework with feature aggregation and channel selection blocks for enhanced multi-channel speaker verification.

Findings

01

Achieves up to 17.70% relative EER reduction in real environments

02

Robust performance across various noise and reverberation conditions

03

Outperforms six baseline methods in experiments

Abstract

The performance of speaker verification degrades significantly in adverse acoustic environments with strong reverberation and noise. To address this issue, this paper proposes a spatial-temporal graph convolutional network (GCN) method for the multi-channel speaker verification with ad-hoc microphone arrays. It includes a feature aggregation block and a channel selection block, both of which are built on graphs. The feature aggregation block fuses speaker features among different time and channels by a spatial-temporal GCN. The graph-based channel selection block discards the noisy channels that may contribute negatively to the system. The proposed method is flexible in incorporating various kinds of graphs and prior knowledge. We compared the proposed method with six representative methods in both real-world and simulated environments. Experimental results show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing