Bidirectional Multiscale Feature Aggregation for Speaker Verification

Jiajun Qi; Wu Guo; Bin Gu

arXiv:2104.00230·eess.AS·April 2, 2021·Interspeech

Bidirectional Multiscale Feature Aggregation for Speaker Verification

Jiajun Qi, Wu Guo, Bin Gu

PDF

Open Access

TL;DR

This paper introduces a bidirectional multiscale feature aggregation network with attentional fusion modules for text-independent speaker verification, improving feature integration and verification accuracy.

Contribution

It presents a novel bidirectional aggregation framework with attentional fusion modules, enhancing feature combination for speaker verification tasks.

Findings

01

Improved verification accuracy on NIST SRE16 and VoxCeleb1 datasets.

02

Effective bidirectional aggregation strategy demonstrated.

03

Attentional fusion modules further boost performance.

Abstract

In this paper, we propose a novel bidirectional multiscale feature aggregation (BMFA) network with attentional fusion modules for text-independent speaker verification. The feature maps from different stages of the backbone network are iteratively combined and refined in both a bottom-up and top-down manner. Furthermore, instead of simple concatenation or element-wise addition of feature maps from different stages, an attentional fusion module is designed to compute the fusion weights. Experiments are conducted on the NIST SRE16 and VoxCeleb1 datasets. The experimental results demonstrate the effectiveness of the bidirectional aggregation strategy and show that the proposed attentional fusion module can further improve the performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing