Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech   Recognition

Kaixun Huang; Ao Zhang; Binbin Zhang; Tianyi Xu; Xingchen Song; Lei; Xie

arXiv:2310.04657·eess.AS·October 10, 2023

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

Kaixun Huang, Ao Zhang, Binbin Zhang, Tianyi Xu, Xingchen Song, Lei, Xie

PDF

Open Access

TL;DR

This paper introduces a spike-triggered deep biasing method for Mandarin speech recognition that effectively combines explicit and implicit contextual biasing, significantly reducing error rates especially on biased phrases.

Contribution

It proposes a novel spike-triggered biasing approach supporting both explicit and implicit bias, enhancing recognition accuracy in end-to-end Mandarin ASR systems.

Findings

01

32.0% relative CER reduction overall

02

68.6% relative CER reduction on contextual phrases

03

Effective combination with shallow fusion methods

Abstract

The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research