Revisiting Large Scale Distributed Machine Learning
Radu Cristian Ionescu

TL;DR
This paper reviews state-of-the-art distributed machine learning algorithms, emphasizing security and communication efficiency in client-server models, with a focus on healthcare applications and proposing scalable, low-cost algorithms.
Contribution
It provides a comprehensive survey of distributed learning methods and introduces new algorithms optimized for healthcare data with low communication and high scalability.
Findings
Thorough overview of supervised and unsupervised distributed algorithms
Introduction of an asynchronous distributed learning algorithm
Empirical evaluation of the k-windows clustering algorithm
Abstract
Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to dealing with large scale high-dimensional data. In domains such as personal healthcare applications, which motivates this survey, distributed machine learning is a promising line of research, both for scaling up learning algorithms, but mostly for dealing with data which is inherently produced at different locations. This report offers a thorough overview of and state-of-the-art algorithms for distributed machine learning, for both supervised and unsupervised learning, ranging from simple linear logistic regression to graphical models and clustering. We propose future directions for most categories, specific to the potential personal healthcare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques
