LERC: Coordinated Cache Management for Data-Parallel Systems

Yinghao Yu; Wei Wang; Jun Zhang; Khaled B. Letaief

arXiv:1708.07941·cs.DC·August 29, 2017

LERC: Coordinated Cache Management for Data-Parallel Systems

Yinghao Yu, Wei Wang, Jun Zhang, Khaled B. Letaief

PDF

Open Access

TL;DR

This paper introduces LERC, a cache management policy that improves data-parallel task speedup by focusing on caching all dependent data blocks together, outperforming traditional hit ratio-based methods.

Contribution

The paper proposes the effective cache hit ratio metric and the LERC policy, which caches dependent data blocks as a whole to enhance task completion times in data-parallel systems.

Findings

01

LERC improves job speedup by up to 37% over LRU.

02

Effective cache hit ratio correlates better with task performance.

03

LERC is implemented in Spark and evaluated on Amazon EC2.

Abstract

Memory caches are being aggressively used in today's data-parallel frameworks such as Spark, Tez and Storm. By caching input and intermediate data in memory, compute tasks can witness speedup by orders of magnitude. To maximize the chance of in-memory data access, existing cache algorithms, be it recency- or frequency-based, settle on cache hit ratio as the optimization objective. However, unlike the conventional belief, we show in this paper that simply pursuing a higher cache hit ratio of individual data blocks does not necessarily translate into faster task completion in data-parallel environments. A data-parallel task typically depends on multiple input data blocks. Unless all of these blocks are cached in memory, no speedup will result. To capture this all-or-nothing property, we propose a more relevant metric, called effective cache hit ratio. Specifically, a cache hit of a data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques