Data-Intensive Workload Consolidation on Hadoop Distributed File System
Reza Moraveji, Javid Taheri, MohammadReza HosseinyFarahabady, Nikzad, Babaii Rizvandi, Albert Y. Zomaya

TL;DR
This paper explores workload consolidation challenges in Hadoop, analyzing cache contention and throughput, and proposes a greedy algorithm to optimize server utilization with promising results.
Contribution
It systematically investigates consolidation challenges in Hadoop, models the problem as a bin packing task, and introduces an efficient greedy algorithm for near-optimal server utilization.
Findings
Greedy algorithm achieves near-optimal solutions.
Cache contention impacts throughput in consolidated workloads.
Modeling consolidation as bin packing is effective.
Abstract
Workload consolidation, sharing physical resources among multiple workloads, is a promising technique to save cost and energy in cluster computing systems. This paper highlights a few challenges of workload consolidation for Hadoop as one of the current state-of-the-art data-intensive cluster computing system. Through a systematic step-by-step procedure, we investigate challenges for efficient server consolidation in Hadoop environments. To this end, we first investigate the inter-relationship between last level cache (LLC) contention and throughput degradation for consolidated workloads on a single physical server employing Hadoop distributed file system (HDFS). We then investigate the general case of consolidation on multiple physical servers so that their throughput never falls below a desired/predefined utilization level. We use our empirical results to model consolidation as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
