Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach
Sandeep Kumar, Sindhu Padakandla, Chandrashekar L, Priyank Parihar, K, Gopinath, Shalabh Bhatnagar

TL;DR
This paper introduces a novel parameter tuning method for Hadoop MapReduce using a noisy gradient approach, significantly improving performance by reducing execution times through automatic, dimension-free optimization.
Contribution
It presents a new tuning methodology based on SPSA that effectively handles cross-parameter interactions and large search spaces in Hadoop configurations.
Findings
Achieved 66% average reduction in Hadoop job execution time.
Reduced execution times by 45% compared to previous tuning methods.
Validated effectiveness on multiple Hadoop benchmarks.
Abstract
Hadoop MapReduce is a framework for distributed storage and processing of large datasets that is quite popular in big data analytics. It has various configuration parameters (knobs) which play an important role in deciding the performance i.e., the execution time of a given big data processing job. Default values of these parameters do not always result in good performance and hence it is important to tune them. However, there is inherent difficulty in tuning the parameters due to two important reasons - firstly, the parameter search space is large and secondly, there are cross-parameter interactions. Hence, there is a need for a dimensionality-free method which can automatically tune the configuration parameters by taking into account the cross-parameter dependencies. In this paper, we propose a novel Hadoop parameter tuning methodology, based on a noisy gradient algorithm known as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
