# Speaker Adaptation for End-to-End CTC Models

**Authors:** Ke Li, Jinyu Li, Yong Zhao, Kshitiz Kumar, Yifan Gong

arXiv: 1901.01239 · 2019-01-07

## TL;DR

This paper introduces two speaker adaptation methods for end-to-end CTC speech recognition models, demonstrating that multi-task learning outperforms KLD regularization in reducing word error rates on Microsoft dictation tasks.

## Contribution

The paper presents novel speaker adaptation techniques for E2E CTC models, specifically applying KLD regularization and multi-task learning, with empirical evidence favoring MTL.

## Key findings

- MTL outperforms KLD regularization in speaker adaptation.
- MTL achieves up to 8.8% relative WERR on supervised adaptation.
- MTL achieves up to 9.6% relative WERR on unsupervised adaptation.

## Abstract

We propose two approaches for speaker adaptation in end-to-end (E2E) automatic speech recognition systems. One is Kullback-Leibler divergence (KLD) regularization and the other is multi-task learning (MTL). Both approaches aim to address the data sparsity especially output target sparsity issue of speaker adaptation in E2E systems. The KLD regularization adapts a model by forcing the output distribution from the adapted model to be close to the unadapted one. The MTL utilizes a jointly trained auxiliary task to improve the performance of the main task. We investigated our approaches on E2E connectionist temporal classification (CTC) models with three different types of output units. Experiments on the Microsoft short message dictation task demonstrated that MTL outperforms KLD regularization. In particular, the MTL adaptation obtained 8.8\% and 4.0\% relative word error rate reductions (WERRs) for supervised and unsupervised adaptations for the word CTC model, and 9.6% and 3.8% relative WERRs for the mix-unit CTC model, respectively.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.01239/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1901.01239/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/1901.01239/full.md

---
Source: https://tomesphere.com/paper/1901.01239