The Parallel Persistent Memory Model

Guy E. Blelloch; Phillip B. Gibbons; Yan Gu; Charles McGuffey; Julian; Shun

arXiv:1805.05580·cs.DC·June 15, 2018

The Parallel Persistent Memory Model

Guy E. Blelloch, Phillip B. Gibbons, Yan Gu, Charles McGuffey, Julian, Shun

PDF

TL;DR

This paper introduces a parallel computational model with persistent memory that is resilient to processor failures, and develops algorithms and scheduling techniques that efficiently handle failures in large-scale parallel systems.

Contribution

It proposes a new fault-tolerant parallel model with persistent memory and designs algorithms and schedulers that ensure efficiency despite failures.

Findings

01

The scheduler guarantees an expected time bound considering failures.

02

Efficient algorithms for parallel sorting and primitives are developed within the model.

03

The framework supports locality-efficient, failure-resilient parallel computation.

Abstract

We consider a parallel computational model that consists of $P$ processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory are lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.