A study of the robustness of raw waveform based speaker embeddings under   mismatched conditions

Ge Zhu; Frank Cwitkowitz; Zhiyao Duan

arXiv:2110.04265·eess.AS·October 13, 2021

A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

Ge Zhu, Frank Cwitkowitz, Zhiyao Duan

PDF

Open Access 1 Repo

TL;DR

This paper investigates the robustness of raw waveform-based speaker embeddings under mismatched conditions, revealing performance issues and proposing strategies like analytic filters and variational dropout to enhance cross-dataset speaker verification.

Contribution

It provides a comprehensive cross-dataset analysis of raw-waveform speaker embeddings and introduces two novel techniques to improve their robustness in mismatched scenarios.

Findings

01

Raw-waveform systems degrade more under mismatched conditions than spectral systems.

02

Using analytic filters improves shift-invariance and robustness.

03

Variational dropout prevents overfitting of irrelevant features.

Abstract

In this paper, we conduct a cross-dataset study on parametric and non-parametric raw-waveform based speaker embeddings through speaker verification experiments. In general, we observe a more significant performance degradation of these raw-waveform systems compared to spectral based systems. We then propose two strategies to improve the performance of raw-waveform based systems on cross-dataset tests. The first strategy is to change the real-valued filters into analytic filters to ensure shift-invariance. The second strategy is to apply variational dropout to non-parametric filters to prevent them from overfitting irrelevant nuance features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gzhu06/tdspkr-mismatch-study
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques

MethodsVariational Dropout · Dropout