Cooperative Gradient Coding
Shudi Weng, Ming Xiao, Chao Ren, and Mikael Skoglund

TL;DR
This paper introduces cooperative gradient coding (CoGC) and an enhanced decoding method GC$^+$ for distributed training, improving communication efficiency and reliability in federated learning under unreliable communication conditions.
Contribution
It proposes a novel cooperative gradient coding framework and a complementary decoding mechanism, with theoretical analysis and validation for improved robustness and efficiency.
Findings
CoGC eliminates dataset replication, reducing communication and computation costs.
GC$^+$ significantly improves system reliability by recovering information lost during decoding failures.
Theoretical bounds and extensive simulations validate the effectiveness of the proposed methods.
Abstract
This work studies gradient coding (GC) in the context of distributed training problems with unreliable communication. We propose cooperative GC (CoGC), a novel gradient-sharing-based GC framework that leverages cooperative communication among clients. This approach ultimately eliminates the need for dataset replication, making it both communication- and computation-efficient and suitable for federated learning (FL). By employing the standard GC decoding mechanism, CoGC yields strictly binary outcomes: either the global model is exactly recovered, or the decoding fails entirely, with no intermediate results. This characteristic ensures the optimality of the training and demonstrates strong resilience to client-to-server communication failures when the communication channels among clients are in good condition. However, it may also result in communication inefficiency and hinder…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
