Towards Contextual Spelling Correction for Customization of End-to-end   Speech Recognition Systems

Xiaoqiang Wang; Yanqing Liu; Jinyu Li; Veljko Miljanic; Sheng Zhao,; Hosam Khalil

arXiv:2203.00888·cs.CL·September 8, 2022

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Veljko Miljanic, Sheng Zhao,, Hosam Khalil

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel contextual spelling correction model for end-to-end speech recognition that improves accuracy by effectively biasing recognition towards specific context phrases, outperforming traditional methods.

Contribution

A new domain-insensitive, sequence-to-sequence contextual biasing approach with autoregressive and non-autoregressive mechanisms for end-to-end ASR systems.

Findings

01

Achieves up to 51% relative WER reduction.

02

NAR model reduces size by 43.2% and speeds up inference by 2.1 times.

03

Outperforms traditional biasing methods.

Abstract

Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, music list, proper nouns, etc. Existing methods mainly include contextual LM biasing and adding bias encoder into end-to-end ASR models. In this work, we introduce a novel approach to do contextual biasing by adding a contextual spelling correction model on top of the end-to-end ASR system. We incorporate contextual information into a sequence-to-sequence spelling correction model with a shared context encoder. Our proposed model includes two different mechanisms: autoregressive (AR) and non-autoregressive (NAR). We propose filtering algorithms to handle large-size context lists, and performance balancing mechanisms to control the biasing degree…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zombbie/entity-synthetic-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing