Performance Characterization of In-Memory Data Analytics on a Modern   Cloud Server

Ahsan Javed Awan; Mats Brorsson; Vladimir Vlassov; Eduard Ayguade

arXiv:1506.07742·cs.DC·November 17, 2016

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov, Eduard Ayguade

PDF

TL;DR

This paper analyzes the micro-architectural performance bottlenecks of in-memory data analytics on modern cloud servers using Apache Spark, highlighting scalability issues and memory latency as key limiting factors.

Contribution

It provides a detailed characterization of Spark workloads on a NUMA architecture, identifying micro-architectural bottlenecks and scalability limitations.

Findings

01

Spark workloads do not scale linearly beyond twelve threads.

02

Memory-bound latency is the primary cause of work time inflation.

03

Workload inefficiencies are linked to memory latency and thread imbalance.

Abstract

In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.