A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
Shantenu Jha, Judy Qiu, Andre Luckow, Pradeep Mantha, Geoffrey C.Fox

TL;DR
This paper compares high-performance computing and Hadoop paradigms for data-intensive applications, analyzing their architectures, workloads, and performance, and proposes using 'Big Data Ogres' as benchmarks for evaluation.
Contribution
It introduces a common framework and terminology for analyzing both paradigms, and presents a semi-quantitative methodology using K-means clustering as a benchmark.
Findings
Architectural similarities despite software differences
Performance insights from K-means clustering experiments
'Big Data Ogres' as a potential benchmarking suite
Abstract
Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large volumes of data. We analyze the ecosystems of the two prominent paradigms for data-intensive applications, hereafter referred to as the high-performance computing and the Apache-Hadoop paradigm. We propose a basis, common terminology and functional factors upon which to analyze the two approaches of both paradigms. We discuss the concept of "Big Data Ogres" and their facets as means of understanding and characterizing the most common application workloads found across the two paradigms. We then discuss the salient features of the two paradigms, and compare and contrast the two approaches. Specifically, we examine common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Advanced Data Storage Technologies
