# The DEEP-ER project: I/O and resiliency extensions for the Cluster-Booster architecture

**Authors:** Anke Kreuzer, Norbert Eicker, Jorge Amaya, Raphael Leger, Estela Suarez

arXiv: 1904.07725 · 2025-05-28

## TL;DR

The DEEP-ER project enhances high-performance computing architectures with advanced I/O and resiliency features, utilizing a heterogeneous Cluster-Booster design with multi-level memory to improve performance and fault tolerance.

## Contribution

It introduces hardware and software extensions for the Cluster-Booster architecture, improving I/O capabilities and fault recovery while maintaining application portability.

## Key findings

- Improved I/O performance demonstrated with real scientific codes.
- Enhanced fault tolerance through new resiliency software stack.
- Maintained application portability across hardware extensions.

## Abstract

The recently completed research project DEEP-ER has developed a variety of hardware and software technologies to improve the I/O capabilities of next generation high-performance computers, and to enable applications recovering from the larger hardware failure rates expected on these machines.   The heterogeneous Cluster-Booster architecture --first introduced in the predecessor DEEP project-- has been extended by a multi-level memory hierarchy employing non-volatile and network-attached memory devices. Based on this hardware infrastructure, an I/O and resiliency software stack has been implemented combining and extending well established libraries and software tools, and sticking to standard user-interfaces. Real-world scientific codes have tested the projects' developments and demonstrated the improvements achieved without compromising the portability of the applications.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.07725/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1904.07725/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1904.07725/full.md

---
Source: https://tomesphere.com/paper/1904.07725