M3R: Increased performance for in-memory Hadoop jobs

Avraham Shinnar; David Cunningham; Benjamin Herta; Vijay Saraswat

arXiv:1208.4168·cs.DB·August 22, 2012·56 cites

M3R: Increased performance for in-memory Hadoop jobs

Avraham Shinnar, David Cunningham, Benjamin Herta, Vijay Saraswat

PDF

Open Access

TL;DR

M3R is a high-performance in-memory implementation of Hadoop MapReduce that significantly accelerates specific workloads by sacrificing resilience, suitable for high-availability clusters with memory capacity.

Contribution

It introduces M3R, an in-memory Hadoop MapReduce engine that offers substantial performance improvements for in-memory workloads without resilience features.

Findings

01

Achieves up to 45x speedup on certain workloads

02

Supports existing Hadoop jobs without modification

03

Provides API extensions for further performance optimization

Abstract

Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into cluster memory. In return, it can run HMR jobs unchanged -- including jobs produced by compilers for higher-level languages such as Pig, Jaql, and SystemML and interactive front-ends like IBM BigSheets -- while providing significantly better performance than the Hadoop engine on several workloads (e.g. 45x on some input sizes for sparse matrix vector multiply). M3R also supports extensions to the HMR API which can enable Map Reduce jobs to run faster on the M3R engine, while not affecting their performance under the Hadoop engine.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Graph Theory and Algorithms