# Exploring the Performance Benefit of Hybrid Memory System on HPC   Environments

**Authors:** Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano, Markidis

arXiv: 1704.08273 · 2017-06-07

## TL;DR

This paper evaluates how hybrid memory systems, specifically high-bandwidth memory (HBM) combined with DRAM on Intel KNL, affect application performance in HPC, revealing benefits for regular access patterns and challenges for random ones.

## Contribution

It provides a detailed analysis of the performance impact of hybrid memory configurations on HPC workloads, highlighting factors influencing efficiency and offering insights for optimizing memory usage.

## Key findings

- Regular memory access applications see up to 3x performance with MCDRAM.
- Random access applications may experience performance degradation with MCDRAM.
-  Additional hardware threads can improve performance for latency-bound applications.

## Abstract

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide 5x higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3x performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.08273/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1704.08273/full.md

---
Source: https://tomesphere.com/paper/1704.08273