A Survey on Distributed Machine Learning
Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen, Kloppenburg, Tim Verbelen, Jan S. Rellermeyer

TL;DR
This survey reviews the challenges, techniques, and systems involved in distributed machine learning, highlighting its importance for handling large-scale data and models beyond the capabilities of centralized systems.
Contribution
It provides a comprehensive overview of current distributed machine learning methods, challenges, and available systems, filling a gap in synthesizing recent advancements.
Findings
Distributed ML addresses large data and model size challenges.
Efficient parallelization and coherence are key challenges.
Various systems support distributed machine learning implementations.
Abstract
The demand for artificial intelligence has grown significantly over the last decade and this growth has been fueled by advances in machine learning techniques and the ability to leverage hardware acceleration. However, in order to increase the quality of predictions and render machine learning solutions feasible for more complex applications, a substantial amount of training data is required. Although small machine learning models can be trained with modest amounts of data, the input for training larger models such as neural networks grows exponentially with the number of parameters. Since the demand for processing training data has outpaced the increase in computation power of computing machinery, there is a need for distributing the machine learning workload across multiple machines, and turning the centralized into a distributed system. These distributed systems present new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
