Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep   Learning Workloads

Guin Gilman; Robert J. Walls

arXiv:2110.00459·cs.DC·October 4, 2021

Characterizing Concurrency Mechanisms for NVIDIA GPUs under Deep Learning Workloads

Guin Gilman, Robert J. Walls

PDF

Open Access

TL;DR

This paper analyzes NVIDIA Ampere GPU concurrency mechanisms under deep learning workloads, revealing limitations in preemption, prioritization, and thread placement that hinder performance consistency and resource utilization.

Contribution

It provides a microarchitectural analysis of GPU concurrency features during deep learning tasks, highlighting key limitations affecting performance.

Findings

01

Lack of fine-grained preemption hampers workload scheduling

02

Resource contention and thread placement issues reduce GPU utilization

03

Deep learning workload variability challenges current GPU concurrency mechanisms

Abstract

We investigate the performance of the concurrency mechanisms available on NVIDIA's new Ampere GPU microarchitecture under deep learning training and inference workloads. In contrast to previous studies that treat the GPU as a black box, we examine scheduling at the microarchitectural level. We find that the lack of fine-grained preemption mechanisms, robust task prioritization options, and contention-aware thread block placement policies limits the effectiveness of NVIDIA's concurrency mechanisms. In summary, the sequential nature of deep learning workloads and their fluctuating resource requirements and kernel runtimes make executing such workloads while maintaining consistently high utilization and low, predictable turnaround times difficult on current NVIDIA hardware.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques