The Gift of Feedback: Improving ASR Model Quality by Learning from User   Corrections through Federated Learning

Lillian Zhou; Yuxin Ding; Mingqing Chen; Harry Zhang; Rohit; Prabhavalkar; Dhruv Guliani; Giovanni Motta; Rajiv Mathews

arXiv:2310.00141·cs.CL·December 4, 2023

The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning

Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit, Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews

PDF

Open Access

TL;DR

This paper proposes a federated learning approach for ASR models that learns from user corrections on edge devices, improving recognition of new and long-tail terms without degrading overall performance.

Contribution

It introduces techniques to adapt ASR models to new vocabulary through federated learning, addressing challenges like fresh terms and catastrophic forgetting.

Findings

01

Improved recognition of new and long-tail words.

02

Maintained overall language model quality.

03

Effective federated learning strategies for on-device ASR adaptation.

Abstract

Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continually learn from on-device user corrections through Federated Learning (FL) to address this issue. We explore techniques to target fresh terms that the model has not previously encountered, learn long-tail words, and mitigate catastrophic forgetting. In experimental evaluations, we find that the proposed techniques improve model recognition of fresh terms, while preserving quality on the overall language distribution.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Internet Traffic Analysis and Secure E-voting · Topic Modeling