A Neural Model for Contextual Biasing Score Learning and Filtering
Wanting Huang, Weiran Wang

TL;DR
This paper presents a neural attention-based biasing model that improves speech recognition accuracy by effectively filtering candidate phrases and integrating external knowledge during decoding.
Contribution
It introduces a novel per-token discriminative training objective and demonstrates improved biasing performance in ASR systems using shallow fusion.
Findings
Significantly reduces incorrect candidate phrases in ASR decoding.
Improves recognition accuracy under various biasing conditions.
The method is modular and compatible with any ASR system.
Abstract
Contextual biasing improves automatic speech recognition (ASR) by integrating external knowledge, such as user-specific phrases or entities, during decoding. In this work, we use an attention-based biasing decoder to produce scores for candidate phrases based on acoustic information extracted by an ASR encoder, which can be used to filter out unlikely phrases and to calculate bonus for shallow-fusion biasing. We introduce a per-token discriminative objective that encourages higher scores for ground-truth phrases while suppressing distractors. Experiments on the Librispeech biasing benchmark show that our method effectively filters out majority of the candidate phrases, and significantly improves recognition accuracy under different biasing conditions when the scores are used in shallow fusion biasing. Our approach is modular and can be used with any ASR system, and the filtering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
