Improving Multi-Application Concurrency Support Within the GPU Memory   System

Rachata Ausavarungnirun; Christopher J. Rossbach; Vance Miller; Joshua; Landgraf; Saugata Ghose; Jayneel Gnadhi; Adwait Jog; Onur Mutlu

arXiv:1708.04911·cs.AR·August 17, 2017·2 cites

Improving Multi-Application Concurrency Support Within the GPU Memory System

Rachata Ausavarungnirun, Christopher J. Rossbach, Vance Miller, Joshua, Landgraf, Saugata Ghose, Jayneel Gnadhi, Adwait Jog, Onur Mutlu

PDF

Open Access

TL;DR

This paper identifies the memory system as a key bottleneck in multi-application GPU execution and proposes MASK, a new memory hierarchy design that improves virtual memory support and reduces contention.

Contribution

The paper introduces MASK, a novel GPU memory hierarchy extension that enhances multi-application concurrency by reducing TLB contention and improving address translation efficiency.

Findings

01

MASK reduces TLB miss rates significantly.

02

Improves GPU throughput during multi-application workloads.

03

Decreases inter-core thrashing in GPU memory system.

Abstract

GPUs exploit a high degree of thread-level parallelism to hide long-latency stalls. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-scale computing environments. However, while CPUs offer relatively seamless multi-application concurrency, and are an excellent fit for multitasking and for virtualized environments, GPUs currently offer only primitive support for multi-application concurrency. Much of the problem in a contemporary GPU lies within the memory system, where multi-application execution requires virtual memory support to manage the address spaces of each application and to provide memory protection. In this work, we perform a detailed analysis of the major problems in state-of-the-art GPU virtual memory management that hinders multi-application execution. Existing GPUs are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies