Federated Self-supervised Speech Representations: Are We There Yet?

Yan Gao; Javier Fernandez-Marques; Titouan Parcollet; Abhinav; Mehrotra; Nicholas D. Lane

arXiv:2204.02804·cs.SD·July 21, 2022·1 cites

Federated Self-supervised Speech Representations: Are We There Yet?

Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Abhinav, Mehrotra, Nicholas D. Lane

PDF

Open Access

TL;DR

This paper systematically examines the challenges of combining self-supervised learning and federated learning for speech models, revealing current limitations and future research directions to enable practical deployment.

Contribution

It provides the first comprehensive analysis of the feasibility, complexities, and bottlenecks of training speech SSL models with federated learning, highlighting key research opportunities.

Findings

01

Current system constraints hinder SSL and FL integration for speech

02

Hardware and algorithmic bottlenecks delay practical deployment until 2027

03

Identifies research directions to overcome existing limitations

Abstract

The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of speech representations. In this paper, we provide a first-of-its-kind systematic study of the feasibility and complexities for training speech SSL models under FL scenarios from the perspective of algorithms, hardware, and systems limits. Despite the high potential of their combination, we find existing system constraints and algorithmic behaviour make SSL and FL systems nearly impossible to build today. Yet critically, our results indicate specific performance bottlenecks and research opportunities that would allow this situation to be reversed. While our analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing