Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation
Xiaoqiang Wang, Yanqing Liu, Jinyu Li, Sheng Zhao

TL;DR
This paper enhances contextual spelling correction for speech recognition by integrating external acoustic attention and semantic-aware data augmentation, significantly improving biasing accuracy especially for rare or unseen phrases.
Contribution
It introduces an improved non-autoregressive model that incorporates acoustic information and semantic data augmentation to address previous limitations in biasing accuracy.
Findings
Achieved up to 20.3% relative gain in name recall.
Outperformed baseline systems across various bias list coverage ratios.
Demonstrated stable improvements over previous CSC methods.
Abstract
We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable improvement in the biasing problem, there are still two drawbacks for further accuracy improvement. First, due to information limitation in text only hypothesis or weak performance of ASR model on rare domains, the CSC model may fail to correct phrases with similar pronunciation or anti-context cases where all biasing phrases are not present in the utterance. Second, there is a discrepancy between the training and inference of CSC. The bias list in training is randomly selected but in inference there may be more similarity between ground truth phrase and other phrases. To solve above limitations, in this paper we propose an improved non-autoregressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing
Methodsfail · Attentive Walk-Aggregating Graph Neural Network
