The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning
Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit, Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews

TL;DR
This paper proposes a federated learning approach for ASR models that learns from user corrections on edge devices, improving recognition of new and long-tail terms without degrading overall performance.
Contribution
It introduces techniques to adapt ASR models to new vocabulary through federated learning, addressing challenges like fresh terms and catastrophic forgetting.
Findings
Improved recognition of new and long-tail words.
Maintained overall language model quality.
Effective federated learning strategies for on-device ASR adaptation.
Abstract
Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continually learn from on-device user corrections through Federated Learning (FL) to address this issue. We explore techniques to target fresh terms that the model has not previously encountered, learn long-tail words, and mitigate catastrophic forgetting. In experimental evaluations, we find that the proposed techniques improve model recognition of fresh terms, while preserving quality on the overall language distribution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Internet Traffic Analysis and Secure E-voting · Topic Modeling
