Distributed Gradient-Regularized Newton Method: Scheduled Consensus and O(epsilon^{-1}) Global Iteration Complexity
Wei Hu, Pengcheng Xie, Ya-Xiang Yuan, and Li Zhang

TL;DR
DisGrem is a decentralized second-order optimization method that achieves near-centralized iteration complexity and efficient communication for convex consensus problems, with proven convergence properties.
Contribution
This paper introduces DisGrem, a novel decentralized Newton method with scheduled consensus, reducing network-wide updates to inexact centralized steps and achieving optimal iteration complexity.
Findings
Achieves O(ε^{-1}) iteration complexity for gradient norm reduction.
Requires O(ε^{-1} log(1/ε)) neighbor communication rounds.
Outperforms baseline methods on nine benchmark problems, with all instances reaching relF <= 10^{-6}.
Abstract
We propose DisGrem, a fully decentralized second-order method for convex consensus optimization over networks. Each agent solves a local Newton system with vanishing gradient-norm regularization and an eigenvalue-shift stabilizer, communicating through a two-stage gossip-mixing mechanism. We introduce a reference-step framework that reduces the network-wide update to an inexact centralized regularized Newton step, replacing the static Hessian-heterogeneity assumptions of prior work with an increment-based dispersion analysis that imposes no irreducible accuracy floor. Under a bounded-iterates assumption, after a burn-in phase whose order is controlled by the scheduled consensus accuracy, the post-burn-in phase achieves an O(epsilon^{-1}) iteration complexity for driving the gradient norm below epsilon, matching the centralized regularized Newton rate without line search or stepsize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
