# HeTM: Transactional Memory for Heterogeneous Systems

**Authors:** Daniel Castro, Paolo Romano, Aleksandar Ilic, Amin M. Khan

arXiv: 1905.00661 · 2020-01-20

## TL;DR

This paper introduces HeTM, a transactional memory abstraction for heterogeneous systems combining CPUs and GPUs, and presents SHeTM, a concrete implementation that reduces programming complexity and improves performance through speculative techniques.

## Contribution

It proposes a novel Heterogeneous Transactional Memory abstraction and a flexible, efficient implementation leveraging speculative techniques for heterogeneous architectures.

## Key findings

- SHeTM effectively hides communication latency between CPUs and GPUs.
- The implementation achieves significant performance improvements in benchmarks.
- SHeTM's modular design allows easy integration of different TM implementations.

## Abstract

Modern heterogeneous computing architectures, which couple multi-core CPUs with discrete many-core GPUs (or other specialized hardware accelerators), enable unprecedented peak performance and energy efficiency levels. Unfortunately, though, developing applications that can take full advantage of the potential of heterogeneous systems is a notoriously hard task. This work takes a step towards reducing the complexity of programming heterogeneous systems by introducing the abstraction of Heterogeneous Transactional Memory (HeTM). HeTM provides programmers with the illusion of a single memory region, shared among the CPUs and the (discrete) GPU(s) of a heterogeneous system, with support for atomic transactions. Besides introducing the abstract semantics and programming model of HeTM, we present the design and evaluation of a concrete implementation of the proposed abstraction, which we named Speculative HeTM (SHeTM). SHeTM makes use of a novel design that leverages on speculative techniques and aims at hiding the inherently large communication latency between CPUs and discrete GPUs and at minimizing inter-device synchronization overhead. SHeTM is based on a modular and extensible design that allows for easily integrating alternative TM implementations on the CPU's and GPU's sides, which allows the flexibility to adopt, on either side, the TM implementation (e.g., in hardware or software) that best fits the applications' workload and the architectural characteristics of the processing unit. We demonstrate the efficiency of the SHeTM via an extensive quantitative study based both on synthetic benchmarks and on a porting of a popular object caching system.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.00661/full.md

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/1905.00661/full.md

## References

60 references — full list in the complete paper: https://tomesphere.com/paper/1905.00661/full.md

---
Source: https://tomesphere.com/paper/1905.00661