An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
Khe Chai Sim, Petr Zadrazil, Fran\c{c}oise Beaufays

TL;DR
This paper explores on-device personalization of end-to-end speech recognition models, balancing privacy, performance, and resource constraints, and demonstrates significant WER improvements with privacy-preserving training methods.
Contribution
It introduces methods for on-device training of personalized speech models, reducing memory usage and maintaining competitive accuracy without data leaving the device.
Findings
Personalization reduces WER by over 58%.
On-device training causes 18.7% performance degradation.
Memory reduction of 45% achieved with increased training time.
Abstract
Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers. However, these systems do not always generalize well for users with very different speech characteristics. This issue can be addressed by building personalized systems that are designed to work well for each specific user. In this paper, we investigate the idea of securely training personalized end-to-end speech recognition models on mobile devices so that user data and models never leave the device and are never stored on a server. We study how the mobile training environment impacts performance by simulating on-device data consumption. We conduct experiments using data collected from speech impaired users for personalization. Our results show that personalization achieved 63.7\% relative word error rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
