Communication-Efficient Gluon in Federated Learning

Xun Qian; Alexander Gaponov; Grigory Malinovsky; Peter Richt\'arik

arXiv:2604.10689·cs.LG·April 14, 2026

Communication-Efficient Gluon in Federated Learning

Xun Qian, Alexander Gaponov, Grigory Malinovsky, Peter Richt\'arik

PDF

TL;DR

This paper introduces Gluon, a communication-efficient federated learning algorithm utilizing variance reduction and compression techniques, demonstrating superior performance in reducing communication costs for large models.

Contribution

Gluon extends Muon optimizers with layer-wise smoothness and compression, achieving faster convergence and lower communication costs in federated learning.

Findings

01

Gluon achieves faster convergence than previous methods.

02

The proposed algorithms significantly reduce communication costs.

03

Numerical experiments confirm the effectiveness of the compression techniques.

Abstract

Recent developments have shown that Muon-type optimizers based on linear minimization oracles (LMOs) over non-Euclidean norm balls have the potential to get superior practical performance than Adam-type methods in the training of large language models. Since large-scale neural networks are trained across massive machines, communication cost becomes the bottleneck. To address this bottleneck, we investigate Gluon, which is an extension of Muon under the more general layer-wise $(L^{0}, L^{1})$ -smooth setting, with both unbiased and contraction compressors. In order to reduce the compression error, we employ the variance reduced technique in SARAH in our compressed methods. The convergence rates and improved communication cost are achieved under certain conditions. As a byproduct, a new variance reduced algorithm with faster convergence rate than Gluon is obtained. We also incorporate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.