An Open-Source Project for MapReduce Performance Self-Tuning
Donghua Chen

TL;DR
This paper introduces an open-source self-tuning system called Catla for Hadoop MapReduce, aiming to simplify and improve performance tuning through optimization techniques, with initial demonstrations showing promising benefits.
Contribution
It presents a new open-source self-tuning system integrating multiple optimization methods to enhance Hadoop MapReduce performance tuning.
Findings
Initial example shows performance improvements
System facilitates easier tuning for users
Open-source availability encourages community development
Abstract
Many Hadoop configuration parameters have significant influence in the performance of running MapReduce jobs on Hadoop. It is time-consuming and tedious for general users to manually tune the parameters for optimal MapReduce performance. Besides, most of existing self-tuning system have opaque implementation, making it difficult to use in practice. This study presents an open-source project that hosts the developing self-tuning system called Catla to address the issues. Catla integrates multiple direct search and derivative-free optimization-based techniques to facilitate tuning efficiency for users. An overview of the system and its usage are illustrated in this study. We also reported a simple example demonstrating the benefits of this ongoing project. Although this project is still developing and far from comprehensive, it is dedicated to contributing Hadoop ecosystem in terms of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Data Stream Mining Techniques · IoT and Edge/Fog Computing
