Rapid Connectionist Speaker Adaptation
Michael Witbrock, Patrick Haffner

TL;DR
This paper introduces SVCnet, a neural network-based system that models speaker variability and enables rapid adaptation of speech recognition systems to new speakers using minimal voice samples.
Contribution
The paper presents a novel neural network system, SVCnet, for modeling speaker variability and enabling quick speaker adaptation without retraining the recognition system.
Findings
SVCnet effectively models speaker variability.
The system enables rapid speaker adaptation with minimal data.
Integration with MS-TDNN improves recognition accuracy.
Abstract
We present SVCnet, a system for modelling speaker variability. Encoder Neural Networks specialized for each speech sound produce low dimensionality models of acoustical variation, and these models are further combined into an overall model of voice variability. A training procedure is described which minimizes the dependence of this model on which sounds have been uttered. Using the trained model (SVCnet) and a brief, unconstrained sample of a new speaker's voice, the system produces a Speaker Voice Code that can be used to adapt a recognition system to the new speaker without retraining. A system which combines SVCnet with an MS-TDNN recognizer is described
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
