# Resilient Work Stealing

**Authors:** Pascal Costanza, Charlotte Herzeel, Wolfgang De Meuter, Roel Wuyts

arXiv: 1706.03539 · 2017-06-13

## TL;DR

This paper introduces Cobra, a restartable task graph-based work-stealing scheduler that enhances software-level fault tolerance in future unreliable processors without significant performance penalties.

## Contribution

It presents Cobra, a novel shared-memory work-stealing scheduler that supports restartable task graphs for improved fault tolerance against hardware soft errors.

## Key findings

- Cobra incurs no performance overhead without failures
- Cobra has low overhead with single failures
- Cobra maintains efficiency with multiple failures

## Abstract

Future generations of processors will exhibit an increase of faults over their lifetime, and it becomes increasingly expensive to solve the resulting reliability issues purely at the hardware level. We propose to model computations in terms of restartable task graphs in order to improve reliability at the software level. As a proof of concept, we present Cobra, a novel design for a shared-memory work-stealing scheduler that realizes this notion of restartable task graphs, and enables computations to survive hardware failures due to soft errors. A comparison with the work-stealing scheduler of Threading Building Blocks on the PARSEC benchmark suite shows that Cobra incurs no performance overhead in the absence of failures, and low performance overheads in the presence of single and multiple failures.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.03539/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1706.03539/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/1706.03539/full.md

---
Source: https://tomesphere.com/paper/1706.03539