MR-RawNet: Speaker verification system with multiple temporal   resolutions for variable duration utterances using raw waveforms

Seung-bin Kim; Chan-yeong Lim; Jungwoo Heo; Ju-ho Kim; Hyun-seo Shin,; Kyo-Won Koo; Ha-Jin Yu

arXiv:2406.07103·eess.AS·June 12, 2024

MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms

Seung-bin Kim, Chan-yeong Lim, Jungwoo Heo, Ju-ho Kim, Hyun-seo Shin,, Kyo-Won Koo, Ha-Jin Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MR-RawNet, a novel speaker verification system that leverages multi-resolution features and attention mechanisms to improve robustness across variable-length utterances using raw waveforms.

Contribution

The paper proposes a multi-resolution feature extractor and attention block that enhance speaker verification performance on variable-duration utterances from raw waveforms.

Findings

01

Outperforms existing raw waveform-based systems on VoxCeleb1

02

Improves robustness to utterance length variability

03

Effectively captures diverse temporal contexts

Abstract

In speaker verification systems, the utilization of short utterances presents a persistent challenge, leading to performance degradation primarily due to insufficient phonetic information to characterize the speakers. To overcome this obstacle, we propose a novel structure, MR-RawNet, designed to enhance the robustness of speaker verification systems against variable duration utterances using raw waveforms. The MR-RawNet extracts time-frequency representations from raw waveforms via a multi-resolution feature extractor that optimally adjusts both temporal and spectral resolutions simultaneously. Furthermore, we apply a multi-resolution attention block that focuses on diverse and extensive temporal contexts, ensuring robustness against changes in utterance length. The experimental results, conducted on VoxCeleb1 dataset, demonstrate that the MR-RawNet exhibits superior performance in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kimho1wq/mr-rawnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing