Improving Performance of End-to-End ASR on Numeric Sequences
Cal Peyser, Hao Zhang, Tara N. Sainath, Zelin Wu

TL;DR
This paper explores methods to enhance end-to-end speech recognition accuracy for numeric sequences on low-memory devices, using synthetic data and neural denormalization to address out-of-vocabulary issues.
Contribution
It introduces techniques combining synthetic numeric data generation and neural denormalization to improve E2E ASR performance on numeric sequences in low-resource settings.
Findings
WER reduced by up to a factor of 8 for long numeric sequences
Synthetic data generation improves numeric recognition accuracy
Neural denormalizer enables better handling of spoken-to-written conversion
Abstract
Recognizing written domain numeric utterances (e.g. I need $1.25.) can be challenging for ASR systems, particularly when numeric sequences are not seen during training. This out-of-vocabulary (OOV) issue is addressed in conventional ASR systems by training part of the model on spoken domain utterances (e.g. I need one dollar and twenty five cents.), for which numeric sequences are composed of in-vocabulary numbers, and then using an FST verbalizer to denormalize the result. Unfortunately, conventional ASR models are not suitable for the low memory setting of on-device speech recognition. E2E models such as RNN-T are attractive for on-device ASR, as they fold the AM, PM and LM of a conventional model into one neural network. However, in the on-device setting the large memory footprint of an FST denormer makes spoken domain training more difficult. In this paper, we investigate techniques…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Handwritten Text Recognition Techniques
MethodsAttention Model
