Segment Aggregation for short utterances speaker verification using raw   waveforms

Seung-bin Kim; Jee-weon Jung; Hye-jin Shim; Ju-ho Kim; Ha-Jin Yu

arXiv:2005.03329·eess.AS·August 5, 2020·1 cites

Segment Aggregation for short utterances speaker verification using raw waveforms

Seung-bin Kim, Jee-weon Jung, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a segment aggregation method that enhances speaker verification accuracy for short utterances by combining segment embeddings, significantly reducing performance degradation compared to traditional systems.

Contribution

The paper proposes a novel ensemble-based segment aggregation approach and a modified teacher-student training method to improve short-utterance speaker verification.

Findings

01

Achieved approximately 45.37% relative improvement on VoxCeleb1 for 1-second utterances.

02

Demonstrated robustness of the method across different input durations.

03

Enhanced stability and accuracy of speaker verification systems for short speech segments.

Abstract

Most studies on speaker verification systems focus on long-duration utterances, which are composed of sufficient phonetic information. However, the performances of these systems are known to degrade when short-duration utterances are inputted due to the lack of phonetic information as compared to the long utterances. In this paper, we propose a method that compensates for the performance degradation of speaker verification for short utterances, referred to as "segment aggregation". The proposed method adopts an ensemble-based design to improve the stability and accuracy of speaker verification systems. The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding. Then, this method simultaneously trains the segment embeddings and the aggregated speaker embedding.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kimho1wq/SegmentAggregation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing