Multi-Timescale Primal Dual Hybrid Gradient with Application to Distributed Optimization
Junhui Zhang, Patrick Jaillet

TL;DR
This paper introduces multi-timescale variants of the PDHG algorithm that are robust to delays and heterogeneity, improving distributed optimization efficiency and achieving optimal gradient similarity dependency.
Contribution
The paper develops novel multi-timescale PDHG algorithms with convergence guarantees under arbitrary update rates and applies them to distributed optimization with improved communication efficiency.
Findings
Algorithms converge under arbitrary dual update rates.
Enhanced efficiency in distributed optimization with heterogeneous objectives.
Achieves linear, optimal dependency on gradient similarity for non-smooth objectives.
Abstract
We propose two variants of the Primal Dual Hybrid Gradient (PDHG) algorithm for saddle point problems with block decomposable duals, hereafter called Multi-Timescale PDHG (MT-PDHG) and its accelerated variant (AMT-PDHG). Through novel mixtures of Bregman divergence and multi-timescale extrapolations, our MT-PDHG and AMT-PDHG converge under arbitrary updating rates for different dual blocks while remaining fully deterministic and robust to extreme delays in dual updates. We further apply our (A)MT-PDHG, augmented with the gradient sliding techniques introduced in Lan et al. (2020), Lan (2016), to distributed optimization. The flexibility in choosing different updating rates for different blocks allows a more refined control over the communication rounds between different pairs of agents, thereby improving the efficiencies in settings with heterogeneity in local objectives and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
