Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource   Management

Andre Luckow; Ioannis Paraskevakos; George Chantzialexiou and; Shantenu Jha

arXiv:1602.00345·cs.DC·February 2, 2016

Hadoop on HPC: Integrating Hadoop and Pilot-based Dynamic Resource Management

Andre Luckow, Ioannis Paraskevakos, George Chantzialexiou and, Shantenu Jha

PDF

TL;DR

This paper explores integrating Hadoop with high-performance computing environments through resource management middleware, enabling scientific applications to combine traditional computing with Hadoop-based data analysis.

Contribution

It proposes extensions to the Pilot-Abstraction to unify resource management for HPC and Hadoop, facilitating integrated scientific workflows.

Findings

01

Extended Pilot-Abstraction supports HPC-Hadoop integration

02

Enables coupling of simulation and data analytics stages

03

Provides practical solutions for hybrid environment management

Abstract

High-performance computing platforms such as supercomputers have traditionally been designed to meet the compute demands of scientific applications. Consequently, they have been architected as producers and not consumers of data. The Apache Hadoop ecosystem has evolved to meet the requirements of data processing applications and has addressed many of the limitations of HPC platforms. There exist a class of scientific applications however, that need the collective capabilities of traditional high-performance computing environments and the Apache Hadoop ecosystem. For example, the scientific domains of bio-molecular dynamics, genomics and network science need to couple traditional computing with Hadoop/Spark based analysis. We investigate the critical question of how to present the capabilities of both computing environments to such scientific applications. Whereas this questions needs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.