Federated ADMM from Bayesian Duality

Thomas M\"ollenhoff; Siddharth Swaroop; Finale Doshi-Velez; Mohammad Emtiyaz Khan

arXiv:2506.13150·cs.LG·March 5, 2026

Federated ADMM from Bayesian Duality

Thomas M\"ollenhoff, Siddharth Swaroop, Finale Doshi-Velez, Mohammad Emtiyaz Khan

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a Bayesian generalization of federated ADMM using variational-Bayesian duality, leading to new algorithms with improved convergence and accuracy for diverse distributed optimization problems.

Contribution

It develops a Bayesian framework that extends federated ADMM via variational-Bayesian duality, enabling new variants with enhanced performance.

Findings

01

ADMM-like updates are recovered with isotropic-Gaussian VB objectives.

02

New variants include Newton-like and Adam-like algorithms.

03

Up to 7% accuracy improvements in deep heterogeneous federated learning.

Abstract

We propose a new Bayesian approach to generalize the federated Alternating Direction Method of Multipliers (ADMM). We show that the solutions of variational-Bayesian (VB) objectives are associated with a duality structure that not only resembles the structure of ADMM's fixed-points but also generalizes it. For example, ADMM-like updates are recovered when the VB objective is optimized over the isotropic-Gaussian family, and new non-trivial extensions are obtained for other exponential-family distributions. These extensions include a Newton-like variant that converges in one step on quadratic objectives and an Adam-like variant that yields up to 7% accuracy boosts for deep heterogeneous cases. Our work opens a new Bayesian way to generalize ADMM and other primal-dual methods.

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

The paper proposes a novel ADMM-like extension to federated learning, with good experimental results.

Weaknesses

It seemed that the argument was more by analogy than exact equality. The claim that ADMM is recovered exactly is misleading because it requires an approximation and therefore is not necessarily recovered exactly. A couple minor points: - In equation 3, a subscript k is missing - On line 195 A* should be defined in the main text rather than just in the appendix. - In figure 4, it would be helpful to mention that each line is numbered with the iteration number.

Reviewer 02Rating 6Confidence 2

Strengths

Authors introduced a Bayesian duality, from which an extension of ADMM that optimizes over distributions naturally follows. For Gaussians with fixed variance, they recover regular ADMM and general Gaussians give Newton-like methods and IVON-ADMM. These show good performance when compared to recent baselines. Other approximating distributions may lead to new interesting splitting algorithms, and more generally, which opens up new research paths to extend and improve primal-dual algorithms using B

Weaknesses

In the federated learning ADMM framework, there are theoretical guarantees for communication complexity and iterative complexity. Can the author briefly discuss the communication complexity and iteration complexity of Bayesian ADMM.

Reviewer 03Rating 2Confidence 4

Strengths

__Conceptual novelty:__ The paper establishes a novel Bayesian duality perspective that unifies ADMM and VB under a single framework. This is an interesting connection that could inspire extensions of primal-dual optimization methods. __Clear motivation and exposition:__ The introduction and backgrounds are well written and clearly position the work relative to prior ADMM and PVI approach (Swaroop et al., 2025). __Framework generality:__ The proposed Bayesian duality formulation provides a pri

Weaknesses

__Soundness of the Formulation:__ While the high-level idea is promising, the derivation in section 3.3 raises concerns about mathematical consistency: - The "Bayesian ADMM" updates (Eqns. 12-14) are expected to follow from alternating optimization of the Lagrangian in Eqn. 11. However, the replacement of the dual update term $\mu_k - \bar{\mu}$ with $\lambda_k - \bar{\lambda}$ lacks justification within the Lagrangian formulation. The reasoning provided in Appendix E.2, appealing to Bayesian in

Code & Models

Repositories

team-approx-bayes/bayes-admm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Distributed systems and fault tolerance · Error Correcting Code Techniques

MethodsAlternating Direction Method of Multipliers