Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Zhe Li, Bicheng Ying, Zidong Liu, Chaosheng Dong, Haibo Yang

TL;DR
This paper introduces DeComFL, a dimension-free federated learning algorithm using zeroth-order optimization, drastically reducing communication costs regardless of model size, with theoretical guarantees and practical validation on large models.
Contribution
DeComFL is the first dimension-free communication algorithm for federated learning leveraging zeroth-order methods, achieving constant communication per round and theoretical convergence guarantees.
Findings
Reduces communication from O(d) to O(1) per round
Achieves linear speedup with number of clients and local steps
Demonstrates significant practical reductions in communication overhead
Abstract
Federated Learning (FL) offers a promising framework for collaborative and privacy-preserving machine learning across distributed data sources. However, the substantial communication costs associated with FL significantly challenge its efficiency. Specifically, in each communication round, the communication costs scale linearly with the model's dimension, which presents a formidable obstacle, especially in large model scenarios. Despite various communication-efficient strategies, the intrinsic dimension-dependent communication cost remains a major bottleneck for current FL implementations. This paper proposes a novel dimension-free communication algorithm - DeComFL, which leverages the zeroth-order optimization techniques and reduces the communication cost from to by transmitting only a constant number of scalar values between clients and the server in…
Peer Reviews
Decision·ICLR 2025 Poster
1. The paper is generally well-written and has a good flow. 2. The convergence analysis is necessary and duly provided. The discussion on the effective rank assumption to improve the pessimistic convergence bound is interesting. I did not check through the details for the correctness of the proof. 3. The algorithm design is sound.
1. I am not convinced about the critical role of zeroth-order optimization in the problem setting to reduce communication costs. 2. Parts about the related works and the experiments could be improved, as detailed below in the Questions.
The problem tackled is interesting and important and the proposed method saves a lot of communication (order of 1000s in experiments). Theoretical analysis allows to reason about potential communication savings during the overall course of training. Experiments are done on large models (up to OPT-1.3 B).
1. The paper does not state how exactly the random seeds are chosen, which might affect the distribution of the generated sequence. As far as I know, random generators guarantee the distribution of sampling a sequence of numbers from the same generator initialized once at some random seed, however with each number having its own random generator with its own random seed, I am not sure what guarantees exist and I imagine it depends on the distributions of the random seeds and particular implemen
It is quite novel to see the use of a zeroth order method for federated learning, and this paper makes a valuable contribution to this area. With small and clever modifications to the previous algorithm by Fang et al. (2022), this research effectively reduces the per-iteration communication costs to a constant for each agent. Supported by both theoretical and experimental evidence, this new method significantly outperforms FedAvg in terms of communications costs.
The assumption made in Theorem 2 is not very standard. I am not sure if $\kappa$ can be truly seen as $O(1)$ constant and independent from $d$. What will be the consequence if $\kappa$ will scale up with $d$, even if it is not $\Theta(d)$? Minor: 1. I think the algorithm was stated for $P=1$. When reading pages 4 and 5, $P$ does not appear to be any part of the algorithm. It was confusing what role the constant $P$ plays in the algorithm. 2. In assumption 4, the second maximum should be over
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Cooperative Communication and Network Coding
