Personalization Strategies for End-to-End Speech Recognition Systems
Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan,, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya, Rastrow, Andreas Stolcke, Ivan Bulyko

TL;DR
This paper presents new biasing and rescoring techniques for end-to-end speech recognition systems that significantly improve personalized content recognition without sacrificing overall accuracy.
Contribution
It introduces a scalable word-level biasing algorithm and a novel second-pass de-biasing method that enhance personalized recognition in end-to-end systems.
Findings
Up to 16% improvement in personalized content recognition.
Additional 14% improvement with second-pass de-biasing.
Up to 2.5% accuracy gain for general use case.
Abstract
The recognition of personalized content, such as contact names, remains a challenging problem for end-to-end speech recognition systems. In this work, we demonstrate how first and second-pass rescoring strategies can be leveraged together to improve the recognition of such words. Following previous work, we use a shallow fusion approach to bias towards recognition of personalized content in the first-pass decoding. We show that such an approach can improve personalized content recognition by up to 16% with minimum degradation on the general use case. We describe a fast and scalable algorithm that enables our biasing models to remain at the word-level, while applying the biasing at the subword level. This has the advantage of not requiring the biasing models to be dependent on any subword symbol table. We also describe a novel second-pass de-biasing approach: used in conjunction with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
