On the Outsized Importance of Learning Rates in Local Update Methods

Zachary Charles; Jakub Kone\v{c}n\'y

arXiv:2007.00878·cs.LG·July 3, 2020·20 cites

On the Outsized Importance of Learning Rates in Local Update Methods

Zachary Charles, Jakub Kone\v{c}n\'y

PDF

Open Access 1 Repo

TL;DR

This paper analyzes local update methods in federated and meta-learning, revealing the critical role of learning rates in convergence and proposing a practical automatic decay method to improve performance.

Contribution

It provides a theoretical characterization of local update methods for quadratic objectives and introduces a new automatic learning rate decay technique.

Findings

01

Proper learning rate tuning can achieve near-optimal results in communication-limited settings.

02

The choice of client learning rate affects the surrogate loss's condition number and alignment with the true loss.

03

The proposed automatic learning rate decay improves empirical performance across various tasks.

Abstract

We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms. We prove that for quadratic objectives, local update methods perform stochastic gradient descent on a surrogate loss function which we exactly characterize. We show that the choice of client learning rate controls the condition number of that surrogate loss, as well as the distance between the minimizers of the surrogate and true loss functions. We use this theory to derive novel convergence rates for federated averaging that showcase this trade-off between the condition number of the surrogate loss and its alignment with the true loss function. We validate our results empirically, showing that in communication-limited settings, proper learning rate tuning is often sufficient to reach near-optimal behavior. We also present a practical method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/federated/tree/master/adaptive_lr_decay
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning

MethodsModel-Agnostic Meta-Learning