Big Data at HPC Wales
Sidharth N. Kashyap, Ade J. Fewings, Jay Davies, Ian Morris, Andrew, Thomas Thomas Green, Martyn F. Guest

TL;DR
This paper presents an automated, scalable solution for integrating Big Data frameworks with HPC systems, enabling seamless workload management without dedicated Hadoop clusters.
Contribution
It introduces a dynamic, unified cluster creation method using YARN in HPC environments, supporting multiple frameworks and native HPC integration.
Findings
Cluster creation is automated and scalable.
Performance on Terasort demonstrates efficiency.
APIs facilitate easy integration into existing workflows.
Abstract
This paper describes an automated approach to handling Big Data workloads on HPC systems. We describe a solution that dynamically creates a unified cluster based on YARN in an HPC Environment, without the need to configure and allocate a dedicated Hadoop cluster. The end user can choose to write the solution in any combination of supported frameworks, a solution that scales seamlessly from a few cores to thousands of cores. This coupling of environments creates a platform for applications to utilize the native HPC solutions along with the Big Data Frameworks. The user will be provided with HPC Wales APIs in multiple languages that will let them integrate this flow into their environment, thereby ensuring that the traditional means of HPC access do not become a bottleneck. We describe the behavior of the cluster creation and performance results on Terasort.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Distributed and Parallel Computing Systems
