Network Support for High-performance Distributed Machine Learning
Francesco Malandrino, Carla Fabiana Chiasserini, Nuria Molner, and Antonio De La Oliva

TL;DR
This paper introduces a system model and algorithm for optimizing network topology in distributed machine learning, aiming to improve learning efficiency and performance by strategic node cooperation and iteration control.
Contribution
It proposes a novel system model for network-aware distributed learning and an algorithm, DoubleClimb, that optimally selects nodes and iterations to minimize costs while meeting accuracy and time targets.
Findings
DoubleClimb achieves near-optimal performance, closely matching the theoretical optimum.
The algorithm outperforms existing methods in real-world network scenarios.
The approach effectively balances learning cost, accuracy, and execution time.
Abstract
The traditional approach to distributed machine learning is to adapt learning algorithms to the network, e.g., reducing updates to curb overhead. Networks based on intelligent edge, instead, make it possible to follow the opposite approach, i.e., to define the logical network topology em around the learning task to perform, so as to meet the desired learning performance. In this paper, we propose a system model that captures such aspects in the context of supervised machine learning, accounting for both learning nodes (that perform computations) and information nodes (that provide data). We then formulate the problem of selecting (i) which learning and information nodes should cooperate to complete the learning task, and (ii) the number of iterations to perform, in order to minimize the learning cost while meeting the target prediction error and execution time. After proving important…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
