FedPop: Federated Population-based Hyperparameter Tuning
Haokun Chen, Denis Krompass, Jindong Gu, Volker Tresp

TL;DR
FedPop introduces a novel population-based hyperparameter tuning algorithm for federated learning, enabling efficient, online optimization of diverse hyperparameters on both client and server sides, significantly improving performance over existing methods.
Contribution
The paper proposes FedPop, a new online hyperparameter tuning method using evolutionary algorithms for federated learning, addressing limitations of prior approaches and supporting broader hyperparameter types.
Findings
FedPop outperforms state-of-the-art tuning methods on FL benchmarks.
It effectively handles complex, real-world FL datasets like Non-IID ImageNet-1K.
The method demonstrates computational efficiency and broad hyperparameter exploration.
Abstract
Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their "training-after-tuning" framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
1. The problem tackled in this paper is well-motivated. Reducing the computational efficiency is an important and difficult problem for the HPO of FL since each run of the FL process is expensive. 2. The organization of this paper is good. The proposed method is clearly described and easy to follow.
1. RS and SHA are two very simple baselines that are not enough to demonstrate the significance of the proposed method. Some other SOTA methods such as Hyperband and BOHB should be compared. In Section 4.4, the convergence analysis is only done over RS and FedPop. Why are the learning curves of other baselines in Table 1 not shown in Figure 5? 2. Some claims of this paper are not well supported. For example, - The authors highlighted several times that FedPop can be conducted in a parallel and
The paper is well-motivated and strikes me as a well-executed extension of Jaderberg et al 2017 to the federated setting. The problem of federated hyperparameter tuning is highly relevant - especially as in cross-device FL we cannot assume for the training-data and setup to be repeatable across different runs. Making maximal use of parallel tuning processes through population based evolution is a good approach. I especially enjoy the consideration of iid vs. non-iid and discussion surrounding ev
The paper describes an algorithm with a lot of hyper-parameters, training & evaluation settings with a lot of specified details across different dataset. As a reader, it is hard to keep an overview of the exact settings, assumptions and baselines as well as the choice of those hyperparameters (e.g. number of clients, dirichlet sampling prob, R_c, R_t, N_c, search-space, annealing rates and much more). I would highly appreciate a detailed table of required parameters to reproduce experiments and
The proposed Fedpop algorithm is able to handle both server and client side HPs, and does not require a final federated model training, further extending what we can do with HPO in the FL setting without relatively high communication overhead. The empirical evaluation utilizes various problem setups such as IID and non-IID per-client distributions, a large-scale cross-silo setup and a large-scale ImageNet-1k training. The performance of the proposed Fedpop is extremely favorable compared to the
To the best of my understanding, Fedpop is only applicable to optimizer related hyperparameters (learning rate, momentum, etc) and not to architectural hyperparameters (such as number of layers or activation choices) much like Fedex as it relies on the "weight-sharing" assumption, and thus is limited in scope. Section 2, page 2 (end of paragraph on "Hyperparameter Tuning for FL System") mentions "it does not impose any restriction on [...] model architecture". But it seems that the architecture
1. The tuning-while-training framework is important for FL due to its high efficiency. 2. Experiment result are done for real-world cross-silo FL applications.
1. No explanation about why RS initializes $N_c = R_t / R_c$. 2. Obviously, the average validation loss of all active clients, is dynamic with the convergence for each HP-configuration. Decoupling the effect between convergence and the HP-configuration is important to implement tuning-while-training. Smaller validation loss may come from convergence instead of good HP-configuration. However, the paper does not consider this problem. Thus, it is hard to believe the proposed method can actually tu
Hyperparameter optimization in federated learning is a relevant and challenging problem, and the application of evolutionary algorithms in this context is indeed novel, to the best of my knowledge. The main ideas of the paper are well motivated and supported by strong and extensive empirical results in a number of relevant benchmarks. Moreover, the paper is very well written and easy to read.
While the main idea of the paper is novel, I do not think it is particularly innovative, since this seems to be a straightforward application of evolutionary algorithms to hyperparameter optimization. The experimental evaluation, albeit extensive and well designed, is not entirely clear in a few points, as I pointed out in the questions below. Minor points: - In “Afterwards, we randomly sample addition K HP-vectors”, I believe the authors meant to say “additional”. - In Section 3.4.4., the sent
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Recommender Systems and Techniques · Traffic Prediction and Management Techniques
