Interference analysis of shared last-level cache on embedded GP-GPUs   with multiple CUDA streams

Gianluca Brilli; Paolo Burgio

arXiv:2310.04848·cs.DC·October 10, 2023

Interference analysis of shared last-level cache on embedded GP-GPUs with multiple CUDA streams

Gianluca Brilli, Paolo Burgio

PDF

Open Access

TL;DR

This paper analyzes how shared last-level cache interference affects embedded GP-GPUs when multiple CUDA streams run concurrently, focusing on data access efficiency and system predictability.

Contribution

It provides a qualitative analysis of cache interference effects caused by concurrent CUDA streams on embedded GP-GPUs, highlighting the impact on performance and predictability.

Findings

01

Interference from concurrent streams affects data access efficiency.

02

Shared cache contention impacts execution time of GPU kernels.

03

Different primitives exhibit varying sensitivity to interference.

Abstract

In modern heterogeneous architectures, the access to data that the application needs is a key factor, in order to make the compute task efficient, in terms of power dissipation and execution time. The new generation SoCs are equipped with large LLCs, in order to make data access as efficient as possible. However, these systems introduce a new level of complexity in terms of the system's predictability, because concurrent tasks must compete for the same resource and contribute to generating interference between them. This paper aims to provide a preliminary qualitative analysis in terms of interference degree that is generated when several concurrent streams are in execution, for example one that performs useful computing tasks and one that generates interference. Specifically, we tested two important primitives: vadd and gemm, respectively subjected to interference with: i) a concurrent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Data Storage Technologies