Embeddings for DNN speaker adaptive training

Joanna Rownicka; Peter Bell; Steve Renals

arXiv:1909.13537·cs.CL·October 1, 2019

Embeddings for DNN speaker adaptive training

Joanna Rownicka, Peter Bell, Steve Renals

PDF

TL;DR

This paper explores embedding-based speaker adaptation for DNNs in speech recognition, comparing different embedding types and adaptation strategies, and demonstrates notable WER improvements with effective embeddings and adaptation methods.

Contribution

It introduces a simplified adaptation approach using a single linear layer on embeddings and evaluates various embeddings for effective speaker adaptation in DNN-based speech recognition.

Findings

01

A single linear layer on embeddings is as effective as multi-layer adaptation networks.

02

Embedding quality for speaker recognition does not directly correlate with ASR performance.

03

Best models achieved 4-9% relative WER reduction over baselines.

Abstract

In this work, we investigate the use of embeddings for speaker-adaptive training of DNNs (DNN-SAT) focusing on a small amount of adaptation data per speaker. DNN-SAT can be viewed as learning a mapping from each embedding to transformation parameters that are applied to the shared parameters of the DNN. We investigate different approaches to applying these transformations, and find that with a good training strategy, a multi-layer adaptation network applied to all hidden layers is no more effective than a single linear layer acting on the embeddings to transform the input features. In the second part of our work, we evaluate different embeddings (i-vectors, x-vectors and deep CNN embeddings) in an additional speaker recognition task in order to gain insight into what should characterize an embedding for DNN-SAT. We find the performance for speaker recognition of a given representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer