BiCoLoR: Communication-Efficient Optimization with Bidirectional Compression and Local Training

Laurent Condat; Artavazd Maranjyan; Peter Richt\'arik

arXiv:2601.12400·math.OC·January 21, 2026

BiCoLoR: Communication-Efficient Optimization with Bidirectional Compression and Local Training

Laurent Condat, Artavazd Maranjyan, Peter Richt\'arik

PDF

Open Access 3 Reviews

TL;DR

BiCoLoR is a novel distributed optimization algorithm that combines bidirectional compression with local training, significantly reducing communication costs in federated learning over wireless networks.

Contribution

It introduces the first algorithm to integrate bidirectional compression with local training, providing accelerated guarantees in heterogeneous convex settings.

Findings

01

Outperforms existing algorithms in communication efficiency

02

Achieves accelerated convergence guarantees in convex settings

03

Establishes new standards for bidirectional communication compression

Abstract

Slow and costly communication is often the main bottleneck in distributed optimization, especially in federated learning where it occurs over wireless networks. We introduce BiCoLoR, a communication-efficient optimization algorithm that combines two widely used and effective strategies: local training, which increases computation between communication rounds, and compression, which encodes high-dimensional vectors into short bitstreams. While these mechanisms have been combined before, compression has typically been applied only to uplink (client-to-server) communication, leaving the downlink (server-to-client) side unaddressed. In practice, however, both directions are costly. We propose BiCoLoR, the first algorithm to combine local training with bidirectional compression using arbitrary unbiased compressors. This joint design achieves accelerated complexity guarantees in both convex…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

The problem considered in the paper is novel and highly relevant to the conference. The paper considers and decouples uplink and downlink costs in distributed optimization. This is done by considering an extra shared parameter $y$ and only communicating differences to this common $y$, resulting in the error variance being added instead of multiplied. The algorithm is versatile and gets accelerated guarantees in both strongly convex and general convex settings. The paper also discusses why impr

Weaknesses

In the case where $\alpha$ is not too small, the TotalCom is the same as standard accelerated gradient descent (AGD), i.e., $\tilde{\mathcal{O}}(d \sqrt{\kappa})$. Even though AGD sends full gradients, its faster convergence rate than BiCoLoR lets it achieve the same TotalCom. So the results of BiCoLoR show that compression can get similar TotalCom, but not better. The authors do discuss why getting better convergence in terms of $d$ would be harder. This is even without considering the bits re

Reviewer 02Rating 4Confidence 3

Strengths

This paper studies communication efficiency, a key challenge in distributed optimization, by adopting a realistic setting that reduces both uplink and downlink bandwidth. The authors provide convergence guarantees, analyze communication complexity, and validate the approach empirically. Without requiring transmitting full vectors with small probability, the algorithm achieves a similar bound on total communication complexity. The experimental results show the better performance compared to exis

Weaknesses

1. The writing needs to be improved: There is no conclusion section and exist several grammar issues. Many abbreviations that hinder readability. Some notations is used with definition or with multiple meanings (like \phi in Line 119-124). 2. The algorithm is too complicated, with many hyperparameters and moving parts; a schematic or simplified pseudocode would improve accessibility. The algorithm appears to build on LoCoDL and bidirectional compression ideas. It would be better if the authors c

Reviewer 03Rating 2Confidence 3

Strengths

+ The algorithmic plumbing (server sends its own compressed difference; clients don’t receive the aggregated uplink average) keeps uplink/downlink stochasticity independent, avoiding the typical multiplicative variance blow-up and enabling sharper complexity. + Table-style comparisons in text relate BiCoLoR to MURANA, MCM, EF21-P+DIANA, and 2Direction; BiCoLoR achieves the same asymptotic TotalCom without occasional full-precision sends. + Theorem 4.1 specifies step sizes (ρ,η) delivers linear

Weaknesses

- There is no conclusion part in this paper. - Empirical scope is thin. The only shown experiments (logistic regression on real-sim) are informative but narrow; there’s no large-scale non-IID, partial participation, or heterogeneous latency study, which matters for BiCC realism. - Systems realism left implicit. The TotalCom model counts bits but omits control-plane costs (index/header overhead for sparsification/quantization, compressor seed sync, server broadcast fan-out), queueing, and stra

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Bandit Algorithms Research