Connecting Federated ADMM to Bayes
Siddharth Swaroop, Mohammad Emtiyaz Khan, Finale Doshi-Velez

TL;DR
This paper reveals a novel connection between federated ADMM and Variational Bayes, leading to new hybrid algorithms that enhance federated learning performance through theoretical insights and empirical validation.
Contribution
It establishes a link between ADMM and Variational Bayes in federated learning, introducing new variants that leverage their combined strengths.
Findings
New ADMM variants derived from VB with flexible covariances.
Improved federated learning performance demonstrated through experiments.
Theoretical connection bridging two distinct federated learning approaches.
Abstract
We provide new connections between two distinct federated learning approaches based on (i) ADMM and (ii) Variational Bayes (VB), and propose new variants by combining their complementary strengths. Specifically, we show that the dual variables in ADMM naturally emerge through the 'site' parameters used in VB with isotropic Gaussian covariances. Using this, we derive two versions of ADMM from VB that use flexible covariances and functional regularisation, respectively. Through numerical experiments, we validate the improvements obtained in performance. The work shows connection between two fields that are believed to be fundamentally different and combines them to improve federated learning.
Peer Reviews
Decision·ICLR 2025 Poster
**[S1]** - The work is clear, well-written, and polished. I particularly liked the spirit of first building the connection between ADMM and PVI, for later proposing three new methods/variants of FedLap that incrementally improve the accuracy performance. For those familiar with the probabilistic perspective of federated learning, PVI, and Laplace approximation, the work and particularly the contributions are easy to follow and nice to be proven on the empirical results. **[S2]** - The drawing o
Despite the fact that I think the work has significant strengths in its current version for being accepted, I detect some unclear parts, or at least "corners" where there is not much light to understand what is going on. Some of these are: **[W1]** - Computational cost is for sure one. While reading the paper, the missing aspect of the computational cost always rings the bell of an average reader. I say this mainly because the connection between ADMM and PVI is fantastic, but once the improveme
## Main Comments - __intuitive demonstration of equivalence__: models trained by minimizing a loss function and a regularizer are often easily shown to correspond to a log likelihood and a log prior hence correspond to a MAP estimate. This work shows the same connection in a less obvious setting. - __intuitive extension__: as corollary from above, generalizing the ADMM method for more general distributions makes perfect sense and allows a natural update equations, notably the precision matrix "w
I felt the paper was very good and only have minor comments. ### Minor Comments - __limitations__ I may have missed this but limitations do not seem to be clearly discussed - __computational complexity__ on P6 only discusses the diagonal covariance case which has linear added complexity. While if I understand correctly, the FedLap-Cov has a quadratic additional parameters, upper triangle of a cov matrix. A few concerns come to mind (1) one goal of federated learning is handling unstable client
1. connections between ADMM and VB-based approaches may be interesting for federated learning.
1. Whether the federated locals can be used to recover the global solution is not justified both theoretically and numerically. 2. Why do you want to connect ADMM to Bayesian? Bayesian learning is known to be the go-to method for conducting uncertainty quantification and non-convex optimization. The algorithm is only evaluated in the optimization perspective. The Bayes interpretation is only used as a preconditioner. If this is the only purpose, the connections to Bayes are not needed at all.
Videos
Taxonomy
TopicsAdvanced Database Systems and Queries · Distributed systems and fault tolerance · Privacy-Preserving Technologies in Data
MethodsAlternating Direction Method of Multipliers
