# Exploring Memory Persistency Models for GPUs

**Authors:** Zhen Lin, Mohammad Alshboul, Yan Solihin, and Huiyang Zhou

arXiv: 1904.12661 · 2019-04-30

## TL;DR

This paper investigates and adapts memory persistency models for GPUs, proposing a compiler-based approach and optimizations to improve performance and reliability in persistent GPU memory systems.

## Contribution

It introduces GPU-specific persistency models, a pragma-based compiler scheme, and optimization techniques like idempotency analysis for efficient durable transactions.

## Key findings

- GPU persistency models can be effectively adapted from CPU models.
- Idempotency analysis reduces logging overheads significantly.
- Proposed architecture demonstrates low overheads in evaluations.

## Abstract

Given its high integration density, high speed, byte addressability, and low standby power, non-volatile or persistent memory is expected to supplement/replace DRAM as main memory. Through persistency programming models (which define durability ordering of stores) and durable transaction constructs, the programmer can provide recoverable data structure (RDS) which allows programs to recover to a consistent state after a failure. While persistency models have been well studied for CPUs, they have been neglected for graphics processing units (GPUs). Considering the importance of GPUs as a dominant accelerator for high performance computing, we investigate persistency models for GPUs.   GPU applications exhibit substantial differences with CPUs applications, hence in this paper we adapt, re-architect, and optimize CPU persistency models for GPUs. We design a pragma-based compiler scheme to express persistency models for GPUs. We identify that the thread hierarchy in GPUs offers intuitive scopes to form epochs and durable transactions. We find that undo logging produces significant performance overheads. We propose to use idempotency analysis to reduce both logging frequency and the size of logs. Through both real-system and simulation evaluations, we show low overheads of our proposed architecture support.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.12661/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/1904.12661/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/1904.12661/full.md

---
Source: https://tomesphere.com/paper/1904.12661