PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform

Wencheng Li; Zhenhua Tan; Jingyu Ning; Zhenche Xia; Danke Wu

arXiv:2105.14826·eess.AS·June 22, 2022·1 cites

PF-Net: Personalized Filter for Speaker Recognition from Raw Waveform

Wencheng Li, Zhenhua Tan, Jingyu Ning, Zhenche Xia, Danke Wu

PDF

Open Access 1 Repo

TL;DR

PF-Net introduces a personalized convolutional neural network architecture that learns detailed filter characteristics from raw waveforms, outperforming previous models like SincNet in convergence speed and accuracy for speaker recognition.

Contribution

The paper presents PF-Net, an improved CNN architecture that learns more detailed and personalized filters from raw speech data, enhancing speaker recognition performance.

Findings

01

PF-Net converges faster than standard CNN.

02

PF-Net outperforms SincNet in recognition accuracy.

03

PF-Net learns more characteristic filter parameters.

Abstract

Speaker recognition using i-vector has been replaced by speaker recognition using deep learning. Speaker recognition based on Convolutional Neural Networks (CNNs) has been widely used in recent years, which learn low-level speech representations from raw waveforms. On this basis, a CNN architecture called SincNet proposes a kind of unique convolutional layer, which has achieved band-pass filters. Compared with standard CNNs, SincNet learns the low and high cut-off frequencies of each filter. This paper proposes an improved CNNs architecture called PF-Net, which encourages the first convolutional layer to implement more personalized filters than SincNet. PF-Net parameterizes the frequency domain shape and can realize band-pass filters by learning some deformation points in frequency domain. Compared with standard CNN, PF-Net can learn the characteristics of each filter. Compared with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tan-openlab/pf-net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing