An Experimental and Comparative Benchmark Study Examining Resource Utilization in Managed Hadoop Context
Uluer Emre Ozdil, Serkan Ayvaz

TL;DR
This study benchmarks resource utilization in managed Hadoop systems on PaaS, revealing performance variability due to architecture and configuration differences despite similar specifications.
Contribution
It provides an experimental comparison of resource utilization across different managed Hadoop PaaS solutions using standard workloads.
Findings
Performance varies significantly across managed Hadoop services.
System architecture influences resource utilization and performance.
Similar specifications do not ensure consistent performance.
Abstract
Transitioning cloud-based Hadoop from IaaS to PaaS, which are commercially conceptualized as pay-as-you-go or pay-per-use, often reduces the associated system costs. However, managed Hadoop systems do present a black-box behavior to the end-users who cannot be clear on the inner performance dynamics, hence, on the benefits of leveraging them. In the study, we aimed to understand managed Hadoop context in terms of resource utilization. We utilized three experimental Hadoop-on-PaaS proposals as they come out-of-the-box and conducted Hadoop specific workloads of the HiBench Benchmark Suite. During the benchmark executions, we collected system resource utilization data on the worker nodes. The results indicated that the same property specifications among cloud services do not guarantee nearby performance outputs, nor consistent results within themselves. We assume that the managed systems'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Caching and Content Delivery
