Communication Compression for Distributed Learning with Aggregate and Server-Guided Feedback
Tomas Ortega, Chun-Yin Huang, Xiaoxiao Li, Hamid Jafarkhani

TL;DR
This paper introduces two new communication compression frameworks for federated learning that eliminate the need for client-specific state, improve convergence, and are validated through theoretical analysis and experiments.
Contribution
The paper proposes CAFe and CAFe-S frameworks enabling biased compression without client state, with theoretical convergence guarantees and practical validation.
Findings
CAFe outperforms standard biased compression methods in non-convex settings.
CAFe-S converges faster as server data becomes more representative.
Experimental results show superior performance over existing schemes.
Abstract
Distributed learning, particularly Federated Learning (FL), faces a significant bottleneck in the communication cost, particularly the uplink transmission of client-to-server updates, which is often constrained by asymmetric bandwidth limits at the edge. Biased compression techniques are effective in practice, but require error feedback mechanisms to provide theoretical guarantees and to ensure convergence when compression is aggressive. Standard error feedback, however, relies on client-specific control variates, which violates user privacy and is incompatible with stateless clients common in large-scale FL. This paper proposes two novel frameworks that enable biased compression without client-side state or control variates. The first, Compressed Aggregate Feedback (CAFe), uses the globally aggregated update from the previous round as a shared control variate for all clients. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms
