Training Speech Recognition Models with Federated Learning: A Quality/Cost Framework
Dhruv Guliani, Francoise Beaufays, Giovanni Motta

TL;DR
This paper introduces a framework for federated learning in speech recognition, balancing model quality and computational cost by managing data distribution heterogeneity and optimizing hyperparameters.
Contribution
It presents a novel metric for quantifying non-IID data effects and shows how hyper-parameter tuning and variational noise can mitigate quality loss while reducing costs.
Findings
A new metric quantifies non-IID data impact on model quality.
Hyper-parameter optimization improves federated speech recognition.
Variational noise helps maintain quality with lower costs.
Abstract
We propose using federated learning, a decentralized on-device learning paradigm, to train speech recognition models. By performing epochs of training on a per-user basis, federated learning must incur the cost of dealing with non-IID data distributions, which are expected to negatively affect the quality of the trained model. We propose a framework by which the degree of non-IID-ness can be varied, consequently illustrating a trade-off between model quality and the computational cost of federated training, which we capture through a novel metric. Finally, we demonstrate that hyper-parameter optimization and appropriate use of variational noise are sufficient to compensate for the quality impact of non-IID distributions, while decreasing the cost.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
