DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Zhifeng Wang; Chunyan Zeng; Surong Duan; Hongjie Ouyang; Hongmin Xu

arXiv:2307.02751·cs.SD·July 7, 2023

DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Zhifeng Wang, Chunyan Zeng, Surong Duan, Hongjie Ouyang, Hongmin Xu

PDF

Open Access

TL;DR

This paper introduces DSARSR, a deep learning approach using stacked auto-encoders to enhance the robustness and accuracy of speaker recognition systems, especially under cross-channel conditions.

Contribution

The paper proposes replacing PLDA with stacked auto-encoders for i-vector reconstruction, improving robustness and performance in speaker recognition.

Findings

01

Outperforms state-of-the-art methods in accuracy

02

Improves robustness under cross-channel conditions

03

Reduces dimensionality of i-vectors effectively

Abstract

Speaker recognition is a biometric modality that utilizes the speaker's speech segments to recognize the identity, determining whether the test speaker belongs to one of the enrolled speakers. In order to improve the robustness of the i-vector framework on cross-channel conditions and explore the nova method for applying deep learning to speaker recognition, the Stacked Auto-encoders are used to get the abstract extraction of the i-vector instead of applying PLDA. After pre-processing and feature extraction, the speaker and channel-independent speeches are employed for UBM training. The UBM is then used to extract the i-vector of the enrollment and test speech. Unlike the traditional i-vector framework, which uses linear discriminant analysis (LDA) to reduce dimension and increase the discrimination between speaker subspaces, this research use stacked auto-encoders to reconstruct the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing