BiCoLoR: Communication-Efficient Optimization with Bidirectional Compression and Local Training
Laurent Condat, Artavazd Maranjyan, Peter Richt\'arik

TL;DR
BiCoLoR is a novel distributed optimization algorithm that combines bidirectional compression with local training, significantly reducing communication costs in federated learning over wireless networks.
Contribution
It introduces the first algorithm to integrate bidirectional compression with local training, providing accelerated guarantees in heterogeneous convex settings.
Findings
Outperforms existing algorithms in communication efficiency
Achieves accelerated convergence guarantees in convex settings
Establishes new standards for bidirectional communication compression
Abstract
Slow and costly communication is often the main bottleneck in distributed optimization, especially in federated learning where it occurs over wireless networks. We introduce BiCoLoR, a communication-efficient optimization algorithm that combines two widely used and effective strategies: local training, which increases computation between communication rounds, and compression, which encodes high-dimensional vectors into short bitstreams. While these mechanisms have been combined before, compression has typically been applied only to uplink (client-to-server) communication, leaving the downlink (server-to-client) side unaddressed. In practice, however, both directions are costly. We propose BiCoLoR, the first algorithm to combine local training with bidirectional compression using arbitrary unbiased compressors. This joint design achieves accelerated complexity guarantees in both convex…
Peer Reviews
Decision·Submitted to ICLR 2026
The problem considered in the paper is novel and highly relevant to the conference. The paper considers and decouples uplink and downlink costs in distributed optimization. This is done by considering an extra shared parameter $y$ and only communicating differences to this common $y$, resulting in the error variance being added instead of multiplied. The algorithm is versatile and gets accelerated guarantees in both strongly convex and general convex settings. The paper also discusses why impr
In the case where $\alpha$ is not too small, the TotalCom is the same as standard accelerated gradient descent (AGD), i.e., $\tilde{\mathcal{O}}(d \sqrt{\kappa})$. Even though AGD sends full gradients, its faster convergence rate than BiCoLoR lets it achieve the same TotalCom. So the results of BiCoLoR show that compression can get similar TotalCom, but not better. The authors do discuss why getting better convergence in terms of $d$ would be harder. This is even without considering the bits re
This paper studies communication efficiency, a key challenge in distributed optimization, by adopting a realistic setting that reduces both uplink and downlink bandwidth. The authors provide convergence guarantees, analyze communication complexity, and validate the approach empirically. Without requiring transmitting full vectors with small probability, the algorithm achieves a similar bound on total communication complexity. The experimental results show the better performance compared to exis
1. The writing needs to be improved: There is no conclusion section and exist several grammar issues. Many abbreviations that hinder readability. Some notations is used with definition or with multiple meanings (like \phi in Line 119-124). 2. The algorithm is too complicated, with many hyperparameters and moving parts; a schematic or simplified pseudocode would improve accessibility. The algorithm appears to build on LoCoDL and bidirectional compression ideas. It would be better if the authors c
+ The algorithmic plumbing (server sends its own compressed difference; clients don’t receive the aggregated uplink average) keeps uplink/downlink stochasticity independent, avoiding the typical multiplicative variance blow-up and enabling sharper complexity. + Table-style comparisons in text relate BiCoLoR to MURANA, MCM, EF21-P+DIANA, and 2Direction; BiCoLoR achieves the same asymptotic TotalCom without occasional full-precision sends. + Theorem 4.1 specifies step sizes (ρ,η) delivers linear
- There is no conclusion part in this paper. - Empirical scope is thin. The only shown experiments (logistic regression on real-sim) are informative but narrow; there’s no large-scale non-IID, partial participation, or heterogeneous latency study, which matters for BiCC realism. - Systems realism left implicit. The TotalCom model counts bits but omits control-plane costs (index/header overhead for sparsification/quantization, compressor seed sync, server broadcast fan-out), queueing, and stra
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Bandit Algorithms Research
