Personalization for BERT-based Discriminative Speech Recognition Rescoring
Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath, Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

TL;DR
This paper investigates three novel neural rescoring methods—gazetteers, prompting, and cross-attention models—to enhance personalized content recognition in BERT-based speech recognition, achieving over 10% WER improvement on personalized data.
Contribution
It introduces and compares three new approaches for incorporating personalized content into speech recognition rescoring, demonstrating significant improvements over baseline methods.
Findings
Gazetteers achieved the highest 10% WER reduction on personalized data.
Natural language prompts improved WER by 7% without training.
All approaches outperformed the baseline in personalized content recognition.
Abstract
Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention based encoder-decoder model. We use internal de-identified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing
