Asymmetrically Decentralized Federated Learning

Qinglun Li; Miao Zhang; Nan Yin; Quanjun Yin; Li Shen

arXiv:2310.05093·cs.LG·October 10, 2023·2 cites

Asymmetrically Decentralized Federated Learning

Qinglun Li, Miao Zhang, Nan Yin, Quanjun Yin, Li Shen

PDF

Open Access 4 Reviews

TL;DR

This paper introduces DFedSGPSM, an asymmetric topology-based decentralized federated learning algorithm that uses Push-Sum, SAM, and local momentum to improve convergence and performance in heterogeneous environments.

Contribution

The paper proposes a novel asymmetric topology-based DFL algorithm combining Push-Sum, SAM, and local momentum, with theoretical convergence guarantees and empirical superiority.

Findings

01

Achieves a convergence rate of O(1/√T) in non-convex settings.

02

Outperforms state-of-the-art optimizers on MNIST, CIFAR10, CIFAR100.

03

Better topological connectivity leads to tighter convergence bounds.

Abstract

To address the communication burden and privacy concerns associated with the centralized server in Federated Learning (FL), Decentralized Federated Learning (DFL) has emerged, which discards the server with a peer-to-peer (P2P) communication framework. However, most existing DFL algorithms are based on symmetric topologies, such as ring and grid topologies, which can easily lead to deadlocks and are susceptible to the impact of network link quality in practice. To address these issues, this paper proposes the DFedSGPSM algorithm, which is based on asymmetric topologies and utilizes the Push-Sum protocol to effectively solve consensus optimization problems. To further improve algorithm performance and alleviate local heterogeneous overfitting in Federated Learning (FL), our algorithm combines the Sharpness Aware Minimization (SAM) optimizer and local momentum. The SAM optimizer employs…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. This paper provides a solid theoretical analysis for their proposed algorithm for asymmetric decentralized learning. 2. The authors conducted extensive experiments to verify their theoretical results and show the efficacy of the proposed DFedSGPSM.

Weaknesses

1. The novelty of this paper is limited. 2. The theoretical results are inadequate. Please see the detailed comments and questions below.

Reviewer 02Rating 3· reject, not good enoughConfidence 5

Strengths

The paper tackles a crucial problem of decentralized optimization with asynchronous communications over directed networks. This problem has many applications and challenges for more realistic optimization frameworks. The authors use the Push-Sum protocol to design an algorithm and analyze its convergence behavior and implications. They also evaluate the algorithm on standard vision task and show its numerical performance.

Weaknesses

The paper has some issues regarding the novelty and significance of the results. Firstly, the problem of asynchronous decentralized optimization has been already studied in [1,2,3] under a more general communication setting that accounts for message loss and delays. They also use the Push-Sum algorithm as the baseline and propose an algorithm with milder assumptions (removing bounded gradient assumption) and better convergence properties. The paper does not compare or discuss its method with the

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. The paper explores the area of decentralized federated learning (asymmetric communication) under controlled heterogeneity data setting (non-iid). 2. The paper is easy to follow. 3. The paper provides theoretical analysis (convergence result). 4. The paper provides ablation study as it relates to the three combined methods.

Weaknesses

1. Novelty is on the weaker side/incremental since the paper combines existing methods. It is not surprising to see the combination of these methods improving performance. 2. Each client can choose $N_i^{out}$ in asymmetric DFL, but results show performance results do not significantly differ with rule-based or random $N_i^{out}$ selection. Flexibility is the advantage (over symmetric) given by the paper. However, this is an unsatisfactory reason if it is related to topological connectivity. 3.

Reviewer 04Rating 3· reject, not good enoughConfidence 5

Strengths

1. They provide theoretical convergence results for their proposed DFedSGPSM and discuss the impacts of some important problem-related parameters such as $C$, $q$ and $L$ on the convergence rate. 2. They conduct extensive experiments, and the ablation study provides a detailed discussion in different settings. 3. The paper is generally well written and easy to follow.

Weaknesses

1. The novlty of the proposed algorithm is limited. It seems that DFedSGPSM is a combination of DFedSAM [1] with Push-Sum protocol [2] and momentum mechanism, and the convergence analysis can be thus directly extended by combining existing theoretical analysis methods. The authors should comment on this and clarify what’s the new technical challenge in their convergence analysis. 2. According to Corollary 1, DFedSGPSM can converge at a rate of $\mathcal{O}(1/\sqrt{T})$. However, existing decent

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Mobile Ad Hoc Networks · Stochastic Gradient Optimization Techniques

MethodsSegment Anything Model · Attentive Walk-Aggregating Graph Neural Network