Network Support for High-performance Distributed Machine Learning

Francesco Malandrino; Carla Fabiana Chiasserini; Nuria Molner; and Antonio De La Oliva

arXiv:2102.03394·cs.NI·July 7, 2022

Network Support for High-performance Distributed Machine Learning

Francesco Malandrino, Carla Fabiana Chiasserini, Nuria Molner, and Antonio De La Oliva

PDF

TL;DR

This paper introduces a system model and algorithm for optimizing network topology in distributed machine learning, aiming to improve learning efficiency and performance by strategic node cooperation and iteration control.

Contribution

It proposes a novel system model for network-aware distributed learning and an algorithm, DoubleClimb, that optimally selects nodes and iterations to minimize costs while meeting accuracy and time targets.

Findings

01

DoubleClimb achieves near-optimal performance, closely matching the theoretical optimum.

02

The algorithm outperforms existing methods in real-world network scenarios.

03

The approach effectively balances learning cost, accuracy, and execution time.

Abstract

The traditional approach to distributed machine learning is to adapt learning algorithms to the network, e.g., reducing updates to curb overhead. Networks based on intelligent edge, instead, make it possible to follow the opposite approach, i.e., to define the logical network topology em around the learning task to perform, so as to meet the desired learning performance. In this paper, we propose a system model that captures such aspects in the context of supervised machine learning, accounting for both learning nodes (that perform computations) and information nodes (that provide data). We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of iterations to perform, in order to minimize the learning cost while meeting the target prediction error and execution time. After proving important…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.