BAFFLE: A Baseline of Backpropagation-Free Federated Learning

Haozhe Feng; Tianyu Pang; Chao Du; Wei Chen; Shuicheng Yan; Min Lin

arXiv:2301.12195·cs.LG·July 23, 2024·6 cites

BAFFLE: A Baseline of Backpropagation-Free Federated Learning

Haozhe Feng, Tianyu Pang, Chao Du, Wei Chen, Shuicheng Yan, Min Lin

PDF

Open Access 1 Repo 3 Reviews

TL;DR

BAFFLE introduces a backpropagation-free federated learning method that replaces gradient computation with multiple forward passes, reducing resource use and enhancing security on edge devices.

Contribution

It proposes a novel federated learning approach that eliminates backpropagation, making training more efficient and secure for edge devices.

Findings

01

Achieves acceptable training results on deep models.

02

Compatible with hardware optimization and model pruning.

03

Reduces computational and storage overhead.

Abstract

Federated learning (FL) is a general principle for decentralized clients to train a server model collectively without sharing local data. FL is a promising framework with practical applications, but its standard training paradigm requires the clients to backpropagate through the model to compute gradients. Since these clients are typically edge devices and not fully trusted, executing backpropagation on them incurs computational and storage overhead as well as white-box vulnerability. In light of this, we develop backpropagation-free federated learning, dubbed BAFFLE, in which backpropagation is replaced by multiple forward processes to estimate gradients. BAFFLE is 1) memory-efficient and easily fits uploading bandwidth; 2) compatible with inference-only hardware optimization and model quantization or pruning; and 3) well-suited to trusted execution environments, because the clients in…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 3· reject, not good enoughConfidence 4

Strengths

1. The paper is well-structured, and the methodology part is easy to follow. 2. The experiment section contains completed ablation studies for understanding the proposed method.

Weaknesses

1. The overall contribution is limited. Zeroth-order optimization in Federated Learning has been already studied, and applying secure aggregation is straightforward as well. The novelty of theoretical analysis also seems limited. It would be better for the authors to summarize the comparison and improvements in terms of the algorithm side with FedZO (Fang et al., 2022). 2. The motivation for applying TEE in the proposed method is unclear. The paper only mentioned that TEE is memory-constraint

Reviewer 02Rating 3· reject, not good enoughConfidence 4

Strengths

Using zeroth-order optimization, this model replaces backpropagation with multiple forward or inference processes to obtain a stochastic estimation of gradients. Since this model is backpropagation-free, the communication costs and the computational and storage overhead for clients can be reduced. Because of this, TEEs which are highly memory-constrained can also be applied.

Weaknesses

Some notations are not well defined when using them. The organization of this paper can be improved. BP baselines are not very clear in each experiment (FedAvg or FedSDG). The tested models are not large. The comparison between BAFFLE and state-of-the-art algorithms is missing. In PRELIMINARIES Zeroth-order FL, some numbers are provided to compare BAFFLE with FedZO, but in EXPERIMENT the comparison to other related works is not well presented. The robustness experiments are not convincing enough

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The authors tackles an important problem, which is to avoid backpropagation in resource-limited federated learning clients. 2. The idea of the paper is easy to follow.

Weaknesses

1. It is not clear what the theoretical results in Theorems 3.1 & 3.2 are exactly saying. The authors should provide more details on what is the theorem saying, and why it is guaranteeing the convergence of the algorithm. Moreover, are the authors assuming a strongly convex function or a non-convex loss function? Or does it holds for both? 2. In experiments, the authors show that BAFFLE is more resource efficient than full backpropagation, while achieving a lower accuracy, which is natural. Now

Code & Models

Repositories

fenghz/baffle
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Cryptography and Data Security