Adaptive Decentralized Federated Learning for Robust Optimization
Shuyuan Wu, Feifei Wang, Yuan Gao, Rui Wang, Hansheng Wang

TL;DR
This paper introduces an adaptive decentralized federated learning method that dynamically adjusts client learning rates to improve robustness against noisy or malicious clients without requiring prior knowledge or large normal client groups.
Contribution
The paper proposes a novel adaptive DFL approach that adjusts client learning rates based on suspicion levels, eliminating the need for prior knowledge or large normal client groups.
Findings
Demonstrates superior robustness in numerical experiments.
Provides convergence guarantees without strict neighbor conditions.
Effectively mitigates impact of abnormal clients.
Abstract
In decentralized federated learning (DFL), the presence of abnormal clients, often caused by noisy or poisoned data, can significantly disrupt the learning process and degrade the overall robustness of the model. Previous methods on this issue often require a sufficiently large number of normal neighboring clients or prior knowledge of reliable clients, which reduces the practical applicability of DFL. To address these limitations, we develop here a novel adaptive DFL (aDFL) approach for robust estimation. The key idea is to adaptively adjust the learning rates of clients. By assigning smaller rates to suspicious clients and larger rates to normal clients, aDFL mitigates the negative impact of abnormal clients on the global model in a fully adaptive way. Our theory does not put any stringent conditions on neighboring nodes and requires no prior knowledge. A rigorous convergence analysis…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The proposed algorithm releases the constraints on the neighboring clients' numbers in a decentralized network, making it more flexible in real applications where the adversary is caused by data-contamination. The mechanism is also easy to implement. 2. Extensive experiments have been conducted to show the effectiveness of the algorithm. Both regression and classification tasks are simulated on various network topologies. 3. The algorithm's effective is supported by theoretical analysis. 4
1. The literature review and experimental comparison with state-of-the-art neglects one recent work that also does not require constraints on the neighboring clients. [1] Zhang, K., Basharat, A. and Xu, P., 2024, December. Byzantine-robust decentralized federated learning via local performance checking. In International Conference on Neural Information Processing (pp. 171-185). Singapore: Springer Nature Singapore. 2. Only the data-contaminated adversary setting is consider in this work, which i
1. The paper introduces a flexible and practical approach to robustness in decentralized federated learning by leveraging adaptive, gradient-norm-based client weighting. 2. Theoretical analyses are compelling, providing convergence behavior, oracle property, and clear conditions for robustness. 3. Experimental evaluation is thorough, covering various synthetic corruptions (e.g., bit-flipping), network structures (e.g., Directed Circle), and practical tasks.
1. The process of learning rate adaptation centers on the mapping $\pi(x) = \exp(-x)$ and the tuning parameter $\lambda_n$ (Equation 4.4). However, there is insufficient elaboration in the main paper on how $\lambda_n$ is chosen, beyond a brief mention of cross-validation and a theoretical interval ($\log N \lesssim \lambda_n \lesssim \sqrt{n} M^{-1/8}$). 2. The paper does not provide an explicit ablation study isolating the impact of the initial estimator’s quality. How do other decentralized r
1. The investigated problem of robust decentralized federated learning is both important and underexplored in this field.
1. The idea of assigning trustworthiness weights to clients based on their gradient norms is questionable. It is unclear why trustworthy clients would consistently have small gradient norms while abnormal clients would have large ones, as this does not hold throughout the training process. For instance, during the early stages of training, all clients typically exhibit large gradient norms. This represents a significant weakness of the paper, and the reviewer is not convinced by this approach.
* The paper studies an interesting research question where we have abnormal clients with corrupted or low-quality data; * The intuition is elegant and practical.
In short, the paper suffers from significant presentation issues, which lead to almost unreadable technical details. Some of the claims in the paper are either incorrect or are not backed by rigorous evidence. * The flow of the paper can be significantly improved. For example, in the abstract, the authors mention they are free from the stringent conditions on neighboring nodes without discussing what the conditions are or what the prior knowledge is. In lines 55 - 58, the methodology part actual
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
