Integrating Categorical Features in End-to-End ASR

Rongqing Huang

arXiv:2110.03047·cs.CL·October 8, 2021

Integrating Categorical Features in End-to-End ASR

Rongqing Huang

PDF

Open Access

TL;DR

This paper proposes a method to incorporate categorical features into end-to-end speech recognition models, improving their ability to leverage diverse and limited data sources across languages, domains, and dialects.

Contribution

It introduces a simple approach to integrate categorical features into E2E ASR models, enhancing multi-domain and low-resource language recognition capabilities.

Findings

01

Joint models with categorical features outperform independent models.

02

Incorporating categorical data improves recognition accuracy across languages and domains.

03

Detailed analysis of training strategies validates the effectiveness of the proposed method.

Abstract

All-neural, end-to-end ASR systems gained rapid interest from the speech recognition community. Such systems convert speech input to text units using a single trainable neural network model. E2E models require large amounts of paired speech text data that is expensive to obtain. The amount of data available varies across different languages and dialects. It is critical to make use of all these data so that both low resource languages and high resource languages can be improved. When we want to deploy an ASR system for a new application domain, the amount of domain specific training data is very limited. To be able to leverage data from existing domains is important for ASR accuracy in the new domain. In this paper, we treat all these aspects as categorical information in an ASR system, and propose a simple yet effective way to integrate categorical features into E2E model. We perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Speech Recognition and Synthesis · Machine Learning and Algorithms