# Text-only domain adaptation for end-to-end ASR using integrated   text-to-mel-spectrogram generator

**Authors:** Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin,, Boris Ginsburg

arXiv: 2302.14036 · 2024-07-08

## TL;DR

This paper introduces an end-to-end ASR system that leverages a text-to-mel-spectrogram generator and GAN-based enhancer for effective domain adaptation using only text data, improving accuracy and training efficiency.

## Contribution

The novel integrated model enables domain adaptation with text-only data by dynamically generating spectrograms, surpassing traditional cascade TTS systems in quality and speed.

## Key findings

- Significant ASR accuracy improvement with text-only domain adaptation.
- Outperforms cascade TTS systems in adaptation quality.
- Faster training compared to traditional methods.

## Abstract

We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both. The proposed model uses an integrated auxiliary block for text-based training. This block combines a non-autoregressive multi-speaker text-to-mel-spectrogram generator with a GAN-based enhancer to improve the spectrogram quality. The proposed system can generate a mel-spectrogram dynamically during training. It can be used to adapt the ASR model to a new domain by using text-only data from this domain. We demonstrate that the proposed training method significantly improves ASR accuracy compared to the system trained on transcribed speech only. It also surpasses cascade TTS systems with the vocoder in the adaptation quality and training speed.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14036/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14036/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/2302.14036/full.md

---
Source: https://tomesphere.com/paper/2302.14036