Iterative MapReduce for Large Scale Machine Learning
Joshua Rosen, Neoklis Polyzotis, Vinayak Borkar, Yingyi Bu, Michael J., Carey, Markus Weimer, Tyson Condie, Raghu Ramakrishnan

TL;DR
This paper introduces Iterative MapReduce, an extension to the traditional MapReduce paradigm, enabling efficient iterative machine learning on Big Data by supporting looping as a first-class construct, thus improving performance and programmability.
Contribution
It proposes a new programming model called Iterative MapReduce with an optimizer, addressing the lack of iteration support in traditional MapReduce for machine learning tasks.
Findings
System-optimized programs are competitive with state-of-the-art solutions.
The optimizer provides theoretical justifications for key steps.
Supports most machine learning techniques efficiently.
Abstract
Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one of the foundational disciplines for data analysis, summarization and inference - on Big Data has become routine at most organizations that operate large clouds, usually based on systems such as Hadoop that support the MapReduce programming paradigm. It is now widely recognized that while MapReduce is highly scalable, it suffers from a critical weakness for machine learning: it does not support iteration. Consequently, one has to program around this limitation, leading to fragile, inefficient code. Further, reliance on the programmer is inherently flawed in a multi-tenanted cloud environment, since the programmer does not have visibility into the state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Graph Theory and Algorithms · Parallel Computing and Optimization Techniques
