Personalization of End-to-end Speech Recognition On Mobile Devices For Named Entities
Khe Chai Sim, Fran\c{c}oise Beaufays, Arnaud Benard, Dhruv Guliani,, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif, Johnson, Giovanni Motta, Lillian Zhou

TL;DR
This paper explores techniques for personalizing end-to-end speech recognition on mobile devices to better recognize user-specific proper names, achieving significant improvements without relying on server-based data storage.
Contribution
It introduces methods for on-device personalization of speech models, including data synthesis and user correction strategies, to enhance proper name recognition.
Findings
Data synthesis increases name recall from 2.4% to 48.6%.
User corrections improve recall to 73.5%.
Personalization is performed entirely on mobile devices.
Abstract
We study the effectiveness of several techniques to personalize end-to-end speech models and improve the recognition of proper names relevant to the user. These techniques differ in the amounts of user effort required to provide supervision, and are evaluated on how they impact speech recognition performance. We propose using keyword-dependent precision and recall metrics to measure vocabulary acquisition performance. We evaluate the algorithms on a dataset that we designed to contain names of persons that are difficult to recognize. Therefore, the baseline recall rate for proper names in this dataset is very low: 2.4%. A data synthesis approach we developed brings it to 48.6%, with no need for speech input from the user. With speech input, if the user corrects only the names, the name recall rate improves to 64.4%. If the user corrects all the recognition errors, we achieve the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
