Identifying the potential of Near Data Computing for Apache Spark

Ahsan Javed Awan; Mats Brorsson; Vladimir Vlassov; Eduard Ayguade

arXiv:1707.09323·cs.DC·July 31, 2017

Identifying the potential of Near Data Computing for Apache Spark

Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov, Eduard Ayguade

PDF

Open Access

TL;DR

This paper explores the potential of Near Data Computing architectures, specifically hybrid processing-in-memory and in-storage processing, to enhance the performance of Apache Spark for big data analytics.

Contribution

It hypothesizes and investigates the benefits of NDC architectures for Apache Spark through extensive profiling on Ivy Bridge servers.

Findings

01

NDC architectures show promise for improving Spark performance

02

Profiling indicates potential efficiency gains with NDC

03

Hybrid processing-in-memory and in-storage approaches are beneficial

Abstract

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. There is also a renewed interest is Near Data Computing (NDC) due to technological advancement in the last decade. However, it is not known if NDC architectures can improve the performance of big data processing frameworks such as Apache Spark. In this position paper, we hypothesize in favour of NDC architecture comprising programmable logic based hybrid 2D integrated processing-in-memory and in-storage processing for Apache Spark, by extensive profiling of Apache Spark based workloads on Ivy Bridge Server.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Data Stream Mining Techniques · IoT and Edge/Fog Computing