Robust Channel Learning for Large-Scale Radio Speaker Verification

Wenhao Yang; Jianguo Wei; Wenhuan Lu; Lei Li; Xugang Lu

arXiv:2406.10956·cs.SD·June 18, 2024

Robust Channel Learning for Large-Scale Radio Speaker Verification

Wenhao Yang, Jianguo Wei, Wenhuan Lu, Lei Li, Xugang Lu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a robust speaker verification framework tailored for radio communications, employing data augmentation, noise handling, and efficient fine-tuning to improve accuracy under challenging channel conditions.

Contribution

It presents a novel Channel Robust Speaker Learning framework with data augmentation, noise modeling, and efficient transfer learning, plus a large-scale radio speech benchmark.

Findings

01

Enhanced speaker verification accuracy in radio scenarios.

02

Effective mitigation of bandwidth and noise effects.

03

Reduced training time with efficient fine-tuning.

Abstract

Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learning (CRSL) framework that enhances the robustness of the current speaker verification pipeline, considering data source, data augmentation, and the efficiency of model transfer processes. Our framework introduces an augmentation module that mitigates bandwidth variations in radio speech datasets by manipulating the bandwidth of training inputs. It also addresses unknown noise by introducing noise within the manifold space. Additionally, we propose an efficient fine-tuning method that reduces the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wenhao-yang/twowayradio
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis