Importance of Smoothness Induced by Optimizers in FL4ASR: Towards   Understanding Federated Learning for End-to-End ASR

Sheikh Shams Azam; Tatiana Likhomanenko; Martin Pelikan; Jan "Honza"; Silovsky

arXiv:2309.13102·eess.AS·September 26, 2023·1 cites

Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR

Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan "Honza", Silovsky

PDF

Open Access

TL;DR

This paper investigates how optimizer-induced smoothness affects federated learning for end-to-end speech recognition, highlighting key factors influencing performance and proposing best practices for effective training.

Contribution

It systematically analyzes optimizer effects, hyperparameters, and training setups to improve federated learning for ASR, offering practical guidelines and insights.

Findings

01

Adaptive optimizers induce varying levels of smoothness affecting model performance.

02

Proper hyperparameter tuning significantly reduces the performance gap between FL and centralized training.

03

Certain training configurations and optimizer choices lead to more robust federated ASR models.

Abstract

In this paper, we start by training End-to-End Automatic Speech Recognition (ASR) models using Federated Learning (FL) and examining the fundamental considerations that can be pivotal in minimizing the performance gap in terms of word error rate between models trained using FL versus their centralized counterpart. Specifically, we study the effect of (i) adaptive optimizers, (ii) loss characteristics via altering Connectionist Temporal Classification (CTC) weight, (iii) model initialization through seed start, (iv) carrying over modeling setup from experiences in centralized training to FL, e.g., pre-layer or post-layer normalization, and (v) FL-specific hyperparameters, such as number of local epochs, client sampling size, and learning rate scheduler, specifically for ASR under heterogeneous data distribution. We shed light on how some optimizers work better than others via inducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing