ALOJA-ML: A Framework for Automating Characterization and Knowledge Discovery in Hadoop Deployments
Josep Ll. Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob, Reinauer, Daron Green

TL;DR
ALOJA-ML leverages machine learning to automate performance modeling and optimization in Hadoop deployments, enabling cost-effective configuration tuning and anomaly detection through systematic analysis of benchmark data.
Contribution
The paper introduces an automated machine learning framework that models Hadoop performance, facilitating predictive analysis, anomaly detection, and efficient benchmarking, advancing beyond manual and expert-guided methods.
Findings
Performance models accurately predict Hadoop execution times.
Models enable anomaly detection and configuration optimization.
Framework reduces operational costs and accelerates knowledge discovery.
Abstract
This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here we detail the approach, efficacy of the model and initial results. Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices. These results are accompanied by a test bed and tools to deploy and evaluate the cost-effectiveness of the different hardware configurations, parameter tunings, and Cloud services. Despite early success within ALOJA from expert-guided benchmarking, it became clear that a genuinely comprehensive study requires automation of modeling procedures to allow a systematic analysis of large and resource-constrained search spaces. ALOJA-ML provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
