Improving Contextual Spelling Correction by External Acoustics Attention   and Semantic Aware Data Augmentation

Xiaoqiang Wang; Yanqing Liu; Jinyu Li; Sheng Zhao

arXiv:2302.11192·cs.SD·February 23, 2023

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Sheng Zhao

PDF

Open Access

TL;DR

This paper enhances contextual spelling correction for speech recognition by integrating external acoustic attention and semantic-aware data augmentation, significantly improving biasing accuracy especially for rare or unseen phrases.

Contribution

It introduces an improved non-autoregressive model that incorporates acoustic information and semantic data augmentation to address previous limitations in biasing accuracy.

Findings

01

Achieved up to 20.3% relative gain in name recall.

02

Outperformed baseline systems across various bias list coverage ratios.

03

Demonstrated stable improvements over previous CSC methods.

Abstract

We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable improvement in the biasing problem, there are still two drawbacks for further accuracy improvement. First, due to information limitation in text only hypothesis or weak performance of ASR model on rare domains, the CSC model may fail to correct phrases with similar pronunciation or anti-context cases where all biasing phrases are not present in the utterance. Second, there is a discrepancy between the training and inference of CSC. The bias list in training is randomly selected but in inference there may be more similarity between ground truth phrase and other phrases. To solve above limitations, in this paper we propose an improved non-autoregressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing

Methodsfail · Attentive Walk-Aggregating Graph Neural Network