Scaff-PD: Communication Efficient Fair and Robust Federated Learning

Yaodong Yu; Sai Praneeth Karimireddy; Yi Ma; Michael I.; Jordan

arXiv:2307.13381·cs.LG·July 26, 2023

Scaff-PD: Communication Efficient Fair and Robust Federated Learning

Yaodong Yu, Sai Praneeth Karimireddy, Yi Ma, Michael I., Jordan

PDF

Open Access 3 Reviews

TL;DR

Scaff-PD is a novel federated learning algorithm that enhances fairness and robustness across heterogeneous clients by optimizing distributionally robust objectives, achieving faster convergence and reduced communication costs.

Contribution

It introduces an accelerated primal dual algorithm with bias correction for efficient, fair, and robust federated learning in heterogeneous environments.

Findings

01

Improves fairness and robustness in federated learning.

02

Reduces communication costs significantly.

03

Maintains competitive accuracy across benchmarks.

Abstract

We present Scaff-PD, a fast and communication-efficient algorithm for distributionally robust federated learning. Our approach improves fairness by optimizing a family of distributionally robust objectives tailored to heterogeneous clients. We leverage the special structure of these objectives, and design an accelerated primal dual (APD) algorithm which uses bias corrected local steps (as in Scaffold) to achieve significant gains in communication efficiency and convergence speed. We evaluate Scaff-PD on several benchmark datasets and demonstrate its effectiveness in improving fairness and robustness while maintaining competitive accuracy. Our results suggest that Scaff-PD is a promising approach for federated learning in resource-constrained and heterogeneous settings.

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The new method solves the DRO problem in saddle-point reformulation. 2. The combination of two technique tackle the issue related to data heterogeneity. 3. SCAFF-DP achieves better rates than previous methods and the experiments support this.

Weaknesses

1. The first thing is related to the paragraph about choosing $\psi$ and $\Lambda$. From the convergence analysis, $\Lambda$ is a bounded set. However, there is no discussion about this in the main part. 2. In the main part there is no expression for local stepsize. 3. The formulation of Theorem 5.5 is not full. There is no word about the smoothness of function $f$. 4. It is good that the authors compare Proxskip and SCAFF-PD theoretically, however, there are a lot of new algorithms of 5th gen

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. This paper is easy to follow. 2. Building upon the foundation of SCAFFOLD, a new algorithm is developed for addressing distributionally robust federated objectives, and its convergence rate is rigorously derived.

Weaknesses

1. The algorithm design and theoretical analysis rely on SCAFFOLD, encompassing the hypothesis and proof framework. This extensive reliance on prior work may potentially diminish the originality and contribution of the proposed method in this paper. 2. A notable issue arises in the algorithm design, as it necessitates two times of communications with nodes at each round, transmitting distinct content. This introduces a huge communication overhead. Additionally, contradictory to the federated con

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

The problem that is studied is of interest to the federated learning community. The developed algorithm also seems to be able to achieve the desired objective in terms of the experimental performance.

Weaknesses

The technical proof part is not rigorous enough. More details will be provided below. There are a couple of typos and unclear definitions, will also be provided below. Major comments: 1. What is $\bar{\tau}$ in Condition 5.1 and how do you set $\gamma_0$? 2. Lemma B.2 is wrong and the proof is also wrong, which leads to the soundness of Theorem B.6 and Theorem 5.1. Specifically, to prove \begin{equation} t_r(\frac{1}{\tau_r}+ \mu_{\boldsymbol{x}}) \geq \frac{t_{r+1}}{\tau_{r+1}}, \end{equati

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques