Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Christopher G. Brinton, Tatiana Likhomanenko

TL;DR
This paper introduces the first benchmark for federated learning with differential privacy in speech recognition, proposing layer-wise gradient normalization techniques to address gradient heterogeneity in large models, enabling practical privacy-preserving FL.
Contribution
It establishes a novel benchmark for DP-FL in ASR and develops layer-wise gradient normalization methods to improve convergence in large transformer models.
Findings
Achieves strong user-level differential privacy with minimal WER increase.
Demonstrates viability of DP-FL in ASR with large user populations.
Provides broader insights applicable to scalable privacy-preserving FL for large models.
Abstract
While federated learning (FL) and differential privacy (DP) have been extensively studied, their application to automatic speech recognition (ASR) remains largely unexplored due to the challenges in training large transformer models. Specifically, large models further exacerbate issues in FL as they are particularly susceptible to gradient heterogeneity across layers, unlike the relatively uniform gradient behavior observed in shallow models. As a result, prior works struggle to converge with standard optimization techniques, even in the absence of DP mechanisms. To the best of our knowledge, no existing work establishes a competitive, practical recipe for FL with DP in the context of ASR. To address this gap, we establish \textbf{the first benchmark for FL with DP in end-to-end ASR}. Our approach centers on per-layer clipping and layer-wise gradient normalization: theoretical analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
MethodsAttentive Walk-Aggregating Graph Neural Network
