Hadoop Mapreduce Performance Enhancement Using In-node Combiners

Woo-Hyun Lee; Hee-Gook Jun; Hyoung-Joo Kim

arXiv:1511.04861·cs.DC·November 17, 2015

Hadoop Mapreduce Performance Enhancement Using In-node Combiners

Woo-Hyun Lee, Hee-Gook Jun, Hyoung-Joo Kim

PDF

TL;DR

This paper proposes an in-node combining technique for Hadoop MapReduce to reduce I/O bottlenecks by minimizing intermediate data and network traffic, thereby enhancing overall performance.

Contribution

It introduces an in-node combiner extension that improves Hadoop MapReduce efficiency by optimizing I/O and reducing network load.

Findings

01

In-node combiner reduces intermediate data size

02

Network traffic between mappers and reducers is decreased

03

Overall MapReduce performance is enhanced

Abstract

While advanced analysis of large dataset is in high demand, data sizes have surpassed capabilities of conventional software and hardware. Hadoop framework distributes large datasets over multiple commodity servers and performs parallel computations. We discuss the I/O bottlenecks of Hadoop framework and propose methods for enhancing I/O performance. A proven approach is to cache data to maximize memory-locality of all map tasks. We introduce an approach to optimize I/O, the in-node combining design which extends the traditional combiner to a node level. The in-node combiner reduces the total number of intermediate results and curtail network traffic between mappers and reducers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.