Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping

Martin Pelikan; Sheikh Shams Azam; Vitaly Feldman; Jan "Honza" Silovsky; Kunal Talwar; Christopher G. Brinton; Tatiana Likhomanenko

arXiv:2310.00098·cs.LG·November 27, 2025·5 cites

Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping

Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Christopher G. Brinton, Tatiana Likhomanenko

PDF

Open Access 1 Video

TL;DR

This paper introduces the first benchmark for federated learning with differential privacy in speech recognition, proposing layer-wise gradient normalization techniques to address gradient heterogeneity in large models, enabling practical privacy-preserving FL.

Contribution

It establishes a novel benchmark for DP-FL in ASR and develops layer-wise gradient normalization methods to improve convergence in large transformer models.

Findings

01

Achieves strong user-level differential privacy with minimal WER increase.

02

Demonstrates viability of DP-FL in ASR with large user populations.

03

Provides broader insights applicable to scalable privacy-preserving FL for large models.

Abstract

While federated learning (FL) and differential privacy (DP) have been extensively studied, their application to automatic speech recognition (ASR) remains largely unexplored due to the challenges in training large transformer models. Specifically, large models further exacerbate issues in FL as they are particularly susceptible to gradient heterogeneity across layers, unlike the relatively uniform gradient behavior observed in shallow models. As a result, prior works struggle to converge with standard optimization techniques, even in the absence of DP mechanisms. To the best of our knowledge, no existing work establishes a competitive, practical recipe for FL with DP in the context of ASR. To address this gap, we establish \textbf{the first benchmark for FL with DP in end-to-end ASR}. Our approach centers on per-layer clipping and layer-wise gradient normalization: theoretical analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers, and Gradient Clipping· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data

MethodsAttentive Walk-Aggregating Graph Neural Network