Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model
Jennifer Drexler Fox, Natalie Delworth

TL;DR
This paper introduces an alternate spelling prediction model that significantly improves the recognition of rare and out-of-vocabulary words in contextual speech recognition, addressing limitations of existing biasing techniques.
Contribution
The paper proposes a novel, simpler spelling prediction model that enhances rare word recall without requiring pronunciation dictionaries or TTS systems.
Findings
Recall of rare words improved by 34.7%
Out-of-vocabulary words recall increased by 97.2%
Baseline results highlight limitations of current shallow fusion biasing techniques
Abstract
Contextual ASR, which takes a list of bias terms as input along with audio, has drawn recent interest as ASR use becomes more widespread. We are releasing contextual biasing lists to accompany the Earnings21 dataset, creating a public benchmark for this task. We present baseline results on this benchmark using a pretrained end-to-end ASR model from the WeNet toolkit. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. Our baseline results confirm observations that end-to-end models struggle in particular with words that are rarely or never seen during training, and that existing shallow fusion techniques do not adequately address this problem. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative and of out-of-vocabulary words by 97.2% relative, compared to contextual biasing without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
