Revisiting Large Scale Distributed Machine Learning

Radu Cristian Ionescu

arXiv:1507.01461·cs.DC·July 7, 2015

Revisiting Large Scale Distributed Machine Learning

Radu Cristian Ionescu

PDF

Open Access

TL;DR

This paper reviews state-of-the-art distributed machine learning algorithms, emphasizing security and communication efficiency in client-server models, with a focus on healthcare applications and proposing scalable, low-cost algorithms.

Contribution

It provides a comprehensive survey of distributed learning methods and introduces new algorithms optimized for healthcare data with low communication and high scalability.

Findings

01

Thorough overview of supervised and unsupervised distributed algorithms

02

Introduction of an asynchronous distributed learning algorithm

03

Empirical evaluation of the k-windows clustering algorithm

Abstract

Nowadays, with the widespread of smartphones and other portable gadgets equipped with a variety of sensors, data is ubiquitous available and the focus of machine learning has shifted from being able to infer from small training samples to dealing with large scale high-dimensional data. In domains such as personal healthcare applications, which motivates this survey, distributed machine learning is a promising line of research, both for scaling up learning algorithms, but mostly for dealing with data which is inherently produced at different locations. This report offers a thorough overview of and state-of-the-art algorithms for distributed machine learning, for both supervised and unsupervised learning, ranging from simple linear logistic regression to graphical models and clustering. We propose future directions for most categories, specific to the potential personal healthcare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Anomaly Detection Techniques and Applications · Data Stream Mining Techniques