Decentralized Hyper-Gradient Computation over Time-Varying Directed   Networks

Naoyuki Terashita; Satoshi Hara

arXiv:2210.02129·stat.ML·June 14, 2023

Decentralized Hyper-Gradient Computation over Time-Varying Directed Networks

Naoyuki Terashita, Satoshi Hara

PDF

Open Access 1 Repo

TL;DR

This paper proposes a communication-efficient method for hyper-gradient computation in decentralized federated learning over time-varying directed networks, enabling influence estimation and personalization with theoretical convergence guarantees.

Contribution

It introduces a new optimality condition and uses Push-Sum for hyper-gradient estimation, reducing communication costs and allowing operation over dynamic directed networks.

Findings

01

The estimator converges to the true hyper-gradient both theoretically and empirically.

02

It enables decentralized influence estimation and personalization in dynamic network settings.

03

The method reduces communication overhead compared to prior Hessian-based approaches.

Abstract

This paper addresses the communication issues when estimating hyper-gradients in decentralized federated learning (FL). Hyper-gradients in decentralized FL quantifies how the performance of globally shared optimal model is influenced by the perturbations in clients' hyper-parameters. In prior work, clients trace this influence through the communication of Hessian matrices over a static undirected network, resulting in (i) excessive communication costs and (ii) inability to make use of more efficient and robust networks, namely, time-varying directed networks. To solve these issues, we introduce an alternative optimality condition for FL using an averaging operation on model parameters and gradients. We then employ Push-Sum as the averaging operation, which is a consensus optimization technique for time-varying directed networks. As a result, the hyper-gradient estimator derived from our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hitachi-rd-cv/pdbo-hgp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms · Distributed Control Multi-Agent Systems