Improving Contextual Recognition of Rare Words with an Alternate   Spelling Prediction Model

Jennifer Drexler Fox; Natalie Delworth

arXiv:2209.01250·cs.CL·September 7, 2022·1 cites

Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

Jennifer Drexler Fox, Natalie Delworth

PDF

Open Access

TL;DR

This paper introduces an alternate spelling prediction model that significantly improves the recognition of rare and out-of-vocabulary words in contextual speech recognition, addressing limitations of existing biasing techniques.

Contribution

The paper proposes a novel, simpler spelling prediction model that enhances rare word recall without requiring pronunciation dictionaries or TTS systems.

Findings

01

Recall of rare words improved by 34.7%

02

Out-of-vocabulary words recall increased by 97.2%

03

Baseline results highlight limitations of current shallow fusion biasing techniques

Abstract

Contextual ASR, which takes a list of bias terms as input along with audio, has drawn recent interest as ASR use becomes more widespread. We are releasing contextual biasing lists to accompany the Earnings21 dataset, creating a public benchmark for this task. We present baseline results on this benchmark using a pretrained end-to-end ASR model from the WeNet toolkit. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. Our baseline results confirm observations that end-to-end models struggle in particular with words that are rarely or never seen during training, and that existing shallow fusion techniques do not adequately address this problem. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative and of out-of-vocabulary words by 97.2% relative, compared to contextual biasing without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing