Temporal Dynamic Convolutional Neural Network for Text-Independent   Speaker Verification and Phonemetic Analysis

Seong-Hu Kim; Hyeonuk Nam; Yong-Hwa Park

arXiv:2110.03213·eess.AS·February 9, 2022·1 cites

Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park

PDF

Open Access 1 Repo

TL;DR

This paper introduces a temporal dynamic CNN that adapts kernels over time to phoneme variations, improving speaker verification accuracy without explicit phoneme labels.

Contribution

The proposed TDY-CNN dynamically adapts kernels to phoneme variations over time, enhancing text-independent speaker verification performance.

Findings

01

Improved EER by 17.3% with TDY-ResNet-38(x0.5)

02

Adaptive kernels are phoneme-specific, especially in early layers

03

Temporal dynamic modeling enhances robustness in speaker verification

Abstract

In the field of text-independent speaker recognition, dynamic models that adapt along the time axis have been proposed to consider the phoneme-varying characteristics of speech. However, a detailed analysis of how dynamic models work depending on phonemes is insufficient. In this paper, we propose temporal dynamic CNN (TDY-CNN) that considers temporal variation of phonemes by applying kernels optimally adapting to each time bin. These kernels adapt to time bins by applying weighted sum of trained basis kernels. Then, an analysis of how adaptive kernels work on different phonemes in various layers is carried out. TDY-ResNet-38(x0.5) using six basis kernels improved an equal error rate (EER), the speaker verification performance, by 17.3% compared to the baseline model ResNet-38(x0.5). In addition, we showed that adaptive kernels depend on phoneme groups and are more phoneme-specific at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shkim816/temporal_dynamic_cnn
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing