Personalization for BERT-based Discriminative Speech Recognition   Rescoring

Jari Kolehmainen; Yile Gu; Aditya Gourav; Prashanth Gurunath; Shivakumar; Ankur Gandhe; Ariya Rastrow; Ivan Bulyko

arXiv:2307.06832·eess.AS·July 14, 2023

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath, Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

PDF

Open Access

TL;DR

This paper investigates three novel neural rescoring methods—gazetteers, prompting, and cross-attention models—to enhance personalized content recognition in BERT-based speech recognition, achieving over 10% WER improvement on personalized data.

Contribution

It introduces and compares three new approaches for incorporating personalized content into speech recognition rescoring, demonstrating significant improvements over baseline methods.

Findings

01

Gazetteers achieved the highest 10% WER reduction on personalized data.

02

Natural language prompts improved WER by 7% without training.

03

All approaches outperformed the baseline in personalized content recognition.

Abstract

Recognition of personalized content remains a challenge in end-to-end speech recognition. We explore three novel approaches that use personalized content in a neural rescoring step to improve recognition: gazetteers, prompting, and a cross-attention based encoder-decoder model. We use internal de-identified en-US data from interactions with a virtual voice assistant supplemented with personalized named entities to compare these approaches. On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline. We also show that on this test set, natural language prompts can improve word error rate by 7% without any training and with a marginal loss in generalization. Overall, gazetteers were found to perform the best with a 10% improvement in word error rate (WER), while also improving WER on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing