Toward Cross-Domain Speech Recognition with End-to-End Models

Thai-Son Nguyen; Sebastian St\"uker; Alex Waibel

arXiv:2003.04194·eess.AS·March 10, 2020·5 cites

Toward Cross-Domain Speech Recognition with End-to-End Models

Thai-Son Nguyen, Sebastian St\"uker, Alex Waibel

PDF

Open Access

TL;DR

This paper demonstrates that neural end-to-end speech recognition models outperform hybrid models in multi-domain settings, achieving comparable or better accuracy without domain-specific adaptations.

Contribution

The study provides empirical evidence that end-to-end models generalize better across multiple domains than hybrid models, simplifying multi-domain speech recognition.

Findings

01

End-to-end models outperform hybrid models on diverse domains.

02

Multi-domain end-to-end models match domain-specific hybrid model performance.

03

End-to-end models eliminate the need for domain-adapted language models.

Abstract

In the area of multi-domain speech recognition, research in the past focused on hybrid acoustic models to build cross-domain and domain-invariant speech recognition systems. In this paper, we empirically examine the difference in behavior between hybrid acoustic models and neural end-to-end systems when mixing acoustic training data from several domains. For these experiments we composed a multi-domain dataset from public sources, with the different domains in the corpus covering a wide variety of topics and acoustic conditions such as telephone conversations, lectures, read speech and broadcast news. We show that for the hybrid models, supplying additional training data from other domains with mismatched acoustic conditions does not increase the performance on specific domains. However, our end-to-end models optimized with sequence-based criterion generalize better than the hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing