Time-frequency Network for Robust Speaker Recognition

Jiguo Li; Tianzi Zhang; Xiaobin Liu; Lirong Zheng

arXiv:2303.02673·cs.SD·March 8, 2023·1 cites

Time-frequency Network for Robust Speaker Recognition

Jiguo Li, Tianzi Zhang, Xiaobin Liu, Lirong Zheng

PDF

Open Access

TL;DR

This paper introduces a time-frequency neural network that combines features from both domains to improve speaker recognition accuracy, outperforming existing methods on standard datasets.

Contribution

The paper proposes a novel deep neural network architecture that fuses time and frequency domain features for more robust speaker recognition.

Findings

01

Outperforms state-of-the-art methods on TIMIT and LibriSpeech datasets.

02

Effectively combines time and frequency domain information.

03

Demonstrates improved recognition accuracy with the proposed fusion approach.

Abstract

The wide deployment of speech-based biometric systems usually demands high-performance speaker recognition algorithms. However, most of the prior works for speaker recognition either process the speech in the frequency domain or time domain, which may produce suboptimal results because both time and frequency domains are important for speaker recognition. In this paper, we attempt to analyze the speech signal in both time and frequency domains and propose the time-frequency network~(TFN) for speaker recognition by extracting and fusing the features in the two domains. Based on the recent advance of deep neural networks, we propose a convolution neural network to encode the raw speech waveform and the frequency spectrum into domain-specific features, which are then fused and transformed into a classification feature space for speaker recognition. Experimental results on the publicly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing