Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions
Daniel Pop

TL;DR
This survey reviews how cloud computing has transformed machine learning by enabling distributed processing, SaaS solutions, and new libraries, addressing challenges of large-scale data analysis.
Contribution
It provides a comprehensive overview of industrial and academic developments in distributed ML and SaaS solutions, highlighting the impact of cloud computing on the field.
Findings
Distributed ML libraries like Apache Mahout and GraphLab are prominent.
Cloud deployment of statistical tools like R and Python is increasing.
ML as SaaS is emerging as a significant trend.
Abstract
Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners. Traditional ML libraries does not support well processing of huge datasets, so that new approaches were needed. Parallelization using modern parallel computing frameworks, such as MapReduce, CUDA, or Dryad gained in popularity and acceptance, resulting in new ML libraries developed on top of these frameworks. We will briefly introduce the most prominent industrial and academic outcomes, such as Apache Mahout, GraphLab or Jubatus. We will investigate how cloud computing paradigm impacted the field of ML. First direction is of popular statistics tools and libraries (R system, Python) deployed in the cloud. A second line of products is augmenting existing tools with plugins that allow users to create a Hadoop cluster in the cloud and run jobs on it. Next on the list are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Cloud Computing and Resource Management · Scientific Computing and Data Management
