FedNST: Federated Noisy Student Training for Automatic Speech   Recognition

Haaris Mehmood; Agnieszka Dobrowolska; Karthikeyan Saravanan; Mete; Ozay

arXiv:2206.02797·eess.AS·July 14, 2022

FedNST: Federated Noisy Student Training for Automatic Speech Recognition

Haaris Mehmood, Agnieszka Dobrowolska, Karthikeyan Saravanan, Mete, Ozay

PDF

Open Access

TL;DR

FedNST introduces a federated semi/self-supervised training method for ASR that effectively leverages unlabelled user data, achieving significant WER improvements without transmitting raw data.

Contribution

This paper presents FedNST, a novel federated learning approach that combines semi/self-supervised learning for training ASR models on unlabelled user data.

Findings

01

22.5% relative WER reduction on LibriSpeech

02

Effective training with mixed labelled and unlabelled data

03

Evaluation on 1173 simulated clients

Abstract

Federated Learning (FL) enables training state-of-the-art Automatic Speech Recognition (ASR) models on user devices (clients) in distributed systems, hence preventing transmission of raw user data to a central server. A key challenge facing practical adoption of FL for ASR is obtaining ground-truth labels on the clients. Existing approaches rely on clients to manually transcribe their speech, which is impractical for obtaining large training corpora. A promising alternative is using semi-/self-supervised learning approaches to leverage unlabelled user data. To this end, we propose FedNST, a novel method for training distributed ASR models using private and unlabelled user data. We explore various facets of FedNST, such as training models with different proportions of labelled and unlabelled data, and evaluate the proposed approach on 1173 simulated clients. Evaluating FedNST on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing

MethodsRandAugment · Dropout · Stochastic Depth · Noisy Student