Personalization Strategies for End-to-End Speech Recognition Systems

Aditya Gourav; Linda Liu; Ankur Gandhe; Yile Gu; Guitang Lan,; Xiangyang Huang; Shashank Kalmane; Gautam Tiwari; Denis Filimonov; Ariya; Rastrow; Andreas Stolcke; Ivan Bulyko

arXiv:2102.07739·cs.CL·February 16, 2021

Personalization Strategies for End-to-End Speech Recognition Systems

Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan,, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya, Rastrow, Andreas Stolcke, Ivan Bulyko

PDF

TL;DR

This paper presents new biasing and rescoring techniques for end-to-end speech recognition systems that significantly improve personalized content recognition without sacrificing overall accuracy.

Contribution

It introduces a scalable word-level biasing algorithm and a novel second-pass de-biasing method that enhance personalized recognition in end-to-end systems.

Findings

01

Up to 16% improvement in personalized content recognition.

02

Additional 14% improvement with second-pass de-biasing.

03

Up to 2.5% accuracy gain for general use case.

Abstract

The recognition of personalized content, such as contact names, remains a challenging problem for end-to-end speech recognition systems. In this work, we demonstrate how first and second-pass rescoring strategies can be leveraged together to improve the recognition of such words. Following previous work, we use a shallow fusion approach to bias towards recognition of personalized content in the first-pass decoding. We show that such an approach can improve personalized content recognition by up to 16% with minimum degradation on the general use case. We describe a fast and scalable algorithm that enables our biasing models to remain at the word-level, while applying the biasing at the subword level. This has the advantage of not requiring the biasing models to be dependent on any subword symbol table. We also describe a novel second-pass de-biasing approach: used in conjunction with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.