Batched matrix operations on distributed GPUs with application in   theoretical physics

Nenad Miji\'c; Davor Davidovi\'c

arXiv:2203.09353·cs.DC·March 18, 2022·1 cites

Batched matrix operations on distributed GPUs with application in theoretical physics

Nenad Miji\'c, Davor Davidovi\'c

PDF

Open Access

TL;DR

This paper introduces a novel approach for batching matrix operations on distributed GPUs by linking multiple MPI ranks to enhance GPU utilization, demonstrated through large-scale quantum physics simulations with significant speed-ups.

Contribution

Proposes linking multiple MPI ranks to GPUs for efficient batched GEMM operations, improving performance in large-scale physics simulations.

Findings

01

Achieved up to 35x speed-up over CPU-only implementations.

02

Enabled simulation of larger quantum spin systems.

03

Demonstrated effectiveness in theoretical physics applications.

Abstract

One of the most important and commonly used operations in many linear algebra functions is matrix-matrix multiplication (GEMM), which is also a key component in obtaining high performance of many scientific codes. It is a computationally intensive function requiring $O (n^{3})$ operations, and its high computational intensity makes it well-suited to be significantly accelerated with GPUs. Today, many research problems require solving a very large number of relatively small GEMM operations that cannot utilise the entire GPU. To overcome this bottleneck, special functions have been developed that pack several GEMM operations into one and then compute them simultaneously on a GPU, which is called a batch operation. In this research work, we have proposed a different approach based on linking multiple GEMM operations to MPI ranks and then binding multiple MPI ranks to a single GPU. To increase…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques