FedNST: Federated Noisy Student Training for Automatic Speech Recognition
Haaris Mehmood, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete, Ozay

TL;DR
FedNST introduces a federated semi/self-supervised training method for ASR that effectively leverages unlabelled user data, achieving significant WER improvements without transmitting raw data.
Contribution
This paper presents FedNST, a novel federated learning approach that combines semi/self-supervised learning for training ASR models on unlabelled user data.
Findings
22.5% relative WER reduction on LibriSpeech
Effective training with mixed labelled and unlabelled data
Evaluation on 1173 simulated clients
Abstract
Federated Learning (FL) enables training state-of-the-art Automatic Speech Recognition (ASR) models on user devices (clients) in distributed systems, hence preventing transmission of raw user data to a central server. A key challenge facing practical adoption of FL for ASR is obtaining ground-truth labels on the clients. Existing approaches rely on clients to manually transcribe their speech, which is impractical for obtaining large training corpora. A promising alternative is using semi-/self-supervised learning approaches to leverage unlabelled user data. To this end, we propose FedNST, a novel method for training distributed ASR models using private and unlabelled user data. We explore various facets of FedNST, such as training models with different proportions of labelled and unlabelled data, and evaluate the proposed approach on 1173 simulated clients. Evaluating FedNST on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing
MethodsRandAugment · Dropout · Stochastic Depth · Noisy Student
