Speaker recognition improvement using blind inversion of distortions

Marcos Faundez-Zanuy; Jordi Sole-Casals

arXiv:2203.01164·cs.SD·March 3, 2022·1 cites

Speaker recognition improvement using blind inversion of distortions

Marcos Faundez-Zanuy, Jordi Sole-Casals

PDF

Open Access

TL;DR

This paper introduces a method to invert nonlinear distortions in speech signals to enhance speaker recognition accuracy, especially under saturation conditions, by combining data fusion techniques with distortion compensation.

Contribution

It presents a novel approach to invert nonlinear distortions in speech signals, improving recognition rates under saturation mismatch conditions.

Findings

01

Recognition rate improved from 80% to 88.57% with saturation.

02

Recognition rate with clean speech reached 87.76%.

03

Combining distortion compensation with data fusion enhances performance.

Abstract

In this paper we propose the inversion of nonlinear distortions in order to improve the recognition rates of a speaker recognizer system. We study the effect of saturations on the test signals, trying to take into account real situations where the training material has been recorded in a controlled situation but the testing signals present some mismatch with the input signal level (saturations). The experimental results shows that a combination of data fusion with and without nonlinear distortion compensation can improve the recognition rates with saturated test sentences from 80% to 88.57%, while the results with clean speech (without saturation) is 87.76% for one microphone.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Speech Recognition and Synthesis