Few-Shot Speaker Identification Using Depthwise Separable Convolutional   Network with Channel Attention

Yanxiong Li; Wucheng Wang; Hao Chen; Wenchang Cao; Wei Li; Qianhua He

arXiv:2204.11180·eess.AS·April 26, 2022·1 cites

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Yanxiong Li, Wucheng Wang, Hao Chen, Wenchang Cao, Wei Li, Qianhua He

PDF

Open Access

TL;DR

This paper introduces a novel few-shot speaker identification approach using a depthwise separable convolutional network with channel attention, trained with prototypical loss, to reduce overfitting and improve accuracy.

Contribution

It presents a new model architecture combining depthwise separable convolutions and channel attention for few-shot speaker ID, addressing overfitting issues in limited data scenarios.

Findings

01

Outperforms state-of-the-art methods in accuracy and F-score

02

Effective in small-sample speaker identification tasks

03

Validated on multiple public speech datasets

Abstract

Although few-shot learning has attracted much attention from the fields of image and audio classification, few efforts have been made on few-shot speaker identification. In the task of few-shot learning, overfitting is a tough problem mainly due to the mismatch between training and testing conditions. In this paper, we propose a few-shot speaker identification method which can alleviate the overfitting problem. In the proposed method, the model of a depthwise separable convolutional network with channel attention is trained with a prototypical loss function. Experimental datasets are extracted from three public speech corpora: Aishell-2, VoxCeleb1 and TORGO. Experimental results show that the proposed method exceeds state-of-the-art methods for few-shot speaker identification in terms of accuracy and F-score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing