CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression
Zhize Li, Peter Richt\'arik

TL;DR
CANITA is a novel distributed convex optimization method that combines communication compression with acceleration, achieving faster convergence rates and reducing communication rounds in federated learning scenarios.
Contribution
It introduces CANITA, the first accelerated gradient method with communication compression, improving convergence rates over previous non-accelerated methods like DIANA.
Findings
Achieves the first accelerated rate for compressed distributed optimization.
Outperforms state-of-the-art non-accelerated methods in convergence speed.
Reduces communication rounds significantly in large-scale federated learning.
Abstract
Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular. Besides, the best theoretically and practically performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of communications (faster convergence), e.g., Nesterov's accelerated gradient descent (Nesterov, 1983, 2004) and Adam (Kingma and Ba, 2014). In order to combine the benefits of communication compression and convergence acceleration, we propose a \emph{compressed and accelerated} gradient method based on ANITA (Li, 2021) for distributed optimization, which we call CANITA. Our CANITA achieves the \emph{first accelerated rate} , which improves upon the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
MethodsAdam
