A Simple Baseline for Domain Adaptation in End to End ASR Systems Using   Synthetic Data

Raviraj Joshi; Anupam Singh

arXiv:2206.13240·eess.AS·June 28, 2022

A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data

Raviraj Joshi, Anupam Singh

PDF

TL;DR

This paper introduces a simple, low-cost domain adaptation method for end-to-end ASR systems by converting text-only data into synthetic speech using TTS and fine-tuning only the final layer, improving performance on specific domains.

Contribution

It proposes a novel approach of using synthetic TTS data and minimal fine-tuning for domain adaptation in end-to-end ASR models, requiring only text data from target domains.

Findings

01

Synthetic TTS data improves word error rates in target domains.

02

Fine-tuning only the final dense layer is effective for domain adaptation.

03

Method works with both CTC and attention-based models.

Abstract

Automatic Speech Recognition(ASR) has been dominated by deep learning-based end-to-end speech recognition models. These approaches require large amounts of labeled data in the form of audio-text pairs. Moreover, these models are more susceptible to domain shift as compared to traditional models. It is common practice to train generic ASR models and then adapt them to target domains using comparatively smaller data sets. We consider a more extreme case of domain adaptation where text-only corpus is available. In this work, we propose a simple baseline technique for domain adaptation in end-to-end speech recognition models. We convert the text-only corpus to audio data using single speaker Text to Speech (TTS) engine. The parallel data in the target domain is then used to fine-tune the final dense layer of generic ASR models. We show that single speaker synthetic TTS data coupled with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.