# When parallel speedups hit the memory wall

**Authors:** Alex F. A. Furtunato, Kyriakos Georgiou, Kerstin Eder, Samuel, Xavier-de-Souza

arXiv: 1905.01234 · 2020-05-11

## TL;DR

This paper introduces new analytical models for parallel speedup that incorporate the impact of the memory wall, accounting for data-access delay variations due to hardware configurations, validated through experiments.

## Contribution

The work develops analytical speedup models that explicitly include data-access delay variations caused by the memory wall, outperforming machine-learning approaches in accuracy and measurement efficiency.

## Key findings

- Models accurately capture memory wall effects on speedup.
- Experimental validation confirms model effectiveness.
- Proposed models require fewer measurements than machine learning.

## Abstract

After Amdahl's trailblazing work, many other authors proposed analytical speedup models but none have considered the limiting effect of the memory wall. These models exploited aspects such as problem-size variation, memory size, communication overhead, and synchronization overhead, but data-access delays are assumed to be constant. Nevertheless, such delays can vary, for example, according to the number of cores used and the ratio between processor and memory frequencies. Given the large number of possible configurations of operating frequency and number of cores that current architectures can offer, suitable speedup models to describe such variations among these configurations are quite desirable for off-line or on-line scheduling decisions. This work proposes new parallel speedup models that account for variations of the average data-access delay to describe the limiting effect of the memory wall on parallel speedups. Analytical results indicate that the proposed modeling can capture the desired behavior while experimental hardware results validate the former. Additionally, we show that when accounting for parameters that reflect the intrinsic characteristics of the applications, such as degree of parallelism and susceptibility to the memory wall, our proposal has significant advantages over machine-learning-based modeling. Moreover, besides being black-box modeling, our experiments show that conventional machine-learning modeling needs about one order of magnitude more measurements to reach the same level of accuracy achieved in our modeling.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.01234/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/1905.01234/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1905.01234/full.md

---
Source: https://tomesphere.com/paper/1905.01234