A Light-weight contextual spelling correction model for customizing transducer-based speech recognition systems
Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li

TL;DR
This paper presents a lightweight contextual spelling correction model that enhances transducer-based speech recognition by effectively incorporating dynamic context information, achieving significant error reduction and handling out-of-vocabulary terms.
Contribution
The work introduces a novel, efficient spelling correction model with a shared context encoder and filtering algorithm, improving ASR accuracy and out-of-vocabulary handling.
Findings
50% relative word error rate reduction
Outperforms contextual LM biasing methods
Effective on out-of-vocabulary terms
Abstract
It's challenging to customize transducer-based automatic speech recognition (ASR) system with context information which is dynamic and unavailable during model training. In this work, we introduce a light-weight contextual spelling correction model to correct context-related recognition errors in transducer-based ASR systems. We incorporate the context information into the spelling correction model with a shared context encoder and use a filtering algorithm to handle large-size context lists. Experiments show that the model improves baseline ASR model performance with about 50% relative word error rate reduction, which also significantly outperforms the baseline method such as contextual LM biasing. The model also shows excellent performance for out-of-vocabulary terms not seen during training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling
