Big Data at HPC Wales

Sidharth N. Kashyap; Ade J. Fewings; Jay Davies; Ian Morris; Andrew; Thomas Thomas Green; Martyn F. Guest

arXiv:1506.08907·cs.DC·July 1, 2015

Big Data at HPC Wales

Sidharth N. Kashyap, Ade J. Fewings, Jay Davies, Ian Morris, Andrew, Thomas Thomas Green, Martyn F. Guest

PDF

Open Access

TL;DR

This paper presents an automated, scalable solution for integrating Big Data frameworks with HPC systems, enabling seamless workload management without dedicated Hadoop clusters.

Contribution

It introduces a dynamic, unified cluster creation method using YARN in HPC environments, supporting multiple frameworks and native HPC integration.

Findings

01

Cluster creation is automated and scalable.

02

Performance on Terasort demonstrates efficiency.

03

APIs facilitate easy integration into existing workflows.

Abstract

This paper describes an automated approach to handling Big Data workloads on HPC systems. We describe a solution that dynamically creates a unified cluster based on YARN in an HPC Environment, without the need to configure and allocate a dedicated Hadoop cluster. The end user can choose to write the solution in any combination of supported frameworks, a solution that scales seamlessly from a few cores to thousands of cores. This coupling of environments creates a platform for applications to utilize the native HPC solutions along with the Big Data Frameworks. The user will be provided with HPC Wales APIs in multiple languages that will let them integrate this flow into their environment, thereby ensuring that the traditional means of HPC access do not become a bottleneck. We describe the behavior of the cluster creation and performance results on Terasort.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Distributed and Parallel Computing Systems