Rapid Connectionist Speaker Adaptation

Michael Witbrock; Patrick Haffner

arXiv:2211.08978·cs.SD·November 17, 2022

Rapid Connectionist Speaker Adaptation

Michael Witbrock, Patrick Haffner

PDF

TL;DR

This paper introduces SVCnet, a neural network-based system that models speaker variability and enables rapid adaptation of speech recognition systems to new speakers using minimal voice samples.

Contribution

The paper presents a novel neural network system, SVCnet, for modeling speaker variability and enabling quick speaker adaptation without retraining the recognition system.

Findings

01

SVCnet effectively models speaker variability.

02

The system enables rapid speaker adaptation with minimal data.

03

Integration with MS-TDNN improves recognition accuracy.

Abstract

We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialized for each speech sound produce low dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a Speaker Voice Code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with an MS-TDNN recognizer is described

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.