A Study of Single and Multi-device Synchronization Methods in Nvidia   GPUs

Lingqi Zhang; Mohamed Wahib; Haoyu Zhang; Satoshi Matsuoka

arXiv:2004.05371·cs.DC·April 14, 2020·1 cites

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the performance and characteristics of synchronization methods in Nvidia GPUs, providing insights for optimizing single and multi-GPU applications.

Contribution

It offers an in-depth analysis of undocumented features and performance considerations of Nvidia GPU synchronization methods, aiding better design choices.

Findings

01

Identifies key performance pitfalls of synchronization methods

02

Provides micro-benchmarks for measuring synchronization performance

03

Case study on reduction operator illustrates practical implications

Abstract

GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia's latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides an in-depth analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

neozhang307/SyncMicrobenchmark
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Interconnection Networks and Systems