Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network

Janne Pylkk\"onen (1); Antti Ukkonen (1; 2); Juho Kilpikoski (1),; Samu Tamminen (1); Hannes Heikinheimo (1) ((1) Speechly; (2) Department of; Computer Science; University of Helsinki; Finland)

arXiv:2104.11127·cs.CL·June 10, 2021

Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network

Janne Pylkk\"onen (1), Antti Ukkonen (1, 2), Juho Kilpikoski (1),, Samu Tamminen (1), Hannes Heikinheimo (1) ((1) Speechly, (2) Department of, Computer Science, University of Helsinki, Finland)

PDF

TL;DR

This paper demonstrates a fast, effective method for domain adaptation of RNN-Transducer speech recognition models using only small amounts of textual data, avoiding complex fusion methods and external language models.

Contribution

It introduces a novel approach to adapt RNN-Transducer models with minimal data by leveraging the prediction network as a language model, simplifying the adaptation process.

Findings

01

Achieves 10-45% relative WER reduction across multiple tasks.

02

Enables quick adaptation without external language models.

03

Provides insights into the prediction network's language modeling capabilities.

Abstract

Adaption of end-to-end speech recognition systems to new tasks is known to be challenging. A number of solutions have been proposed which apply external language models with various fusion methods, possibly with a combination of two-pass decoding. Also TTS systems have been used to generate adaptation data for the end-to-end models. In this paper we show that RNN-transducer models can be effectively adapted to new domains using only small amounts of textual data. By taking advantage of model's inherent structure, where the prediction network is interpreted as a language model, we can apply fast adaptation to the model. Adapting the model avoids the need for complicated decoding time fusions and external language models. Using appropriate regularization, the prediction network can be adapted to new domains while still retaining good generalization capabilities. We show with multiple ASR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.