M3R: Increased performance for in-memory Hadoop jobs
Avraham Shinnar, David Cunningham, Benjamin Herta, Vijay Saraswat

TL;DR
M3R is a high-performance in-memory implementation of Hadoop MapReduce that significantly accelerates specific workloads by sacrificing resilience, suitable for high-availability clusters with memory capacity.
Contribution
It introduces M3R, an in-memory Hadoop MapReduce engine that offers substantial performance improvements for in-memory workloads without resilience features.
Findings
Achieves up to 45x speedup on certain workloads
Supports existing Hadoop jobs without modification
Provides API extensions for further performance optimization
Abstract
Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged -- including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets -- while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their performance under the Hadoop engine.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Graph Theory and Algorithms
