Scalable Multi-node Fast Fourier Transform on GPUs

Manthan Verma; Soumyadeep Chatterjee; Gaurav Garg; Bharatkumar Sharma,; Nishant Arya; Shashi Kumar; Anish Saxena; Mahendra K. Verma

arXiv:2202.12756·physics.comp-ph·February 28, 2022·SN Comput. Sci.

Scalable Multi-node Fast Fourier Transform on GPUs

Manthan Verma, Soumyadeep Chatterjee, Gaurav Garg, Bharatkumar Sharma,, Nishant Arya, Shashi Kumar, Anish Saxena, Mahendra K. Verma

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable multi-node GPU-FFT library optimized for high-performance computing, demonstrating efficient scaling on large GPU clusters with impressive performance metrics.

Contribution

The paper presents a novel multi-node GPU-FFT library employing slab decomposition and MPI, achieving scalable performance on large GPU clusters with detailed benchmarking.

Findings

01

Good scaling observed for 4096^3 grid with 64 to 512 GPUs

02

GPU-FFT timings comparable to multicore CPU FFT on large cores

03

Efficient communication via NVlink enhances GPU-FFT performance

Abstract

In this paper, we present the details of our multi-node GPU-FFT library, as well its scaling on Selene HPC system. Our library employs slab decomposition for data division and MPI for communication among GPUs. We performed GPU-FFT on $102 4^{3}$ , $204 8^{3}$ , and $409 6^{3}$ grids using a maximum of 512 A100 GPUs. We observed good scaling for $409 6^{3}$ grid with 64 to 512 GPUs. We report that the timings of multicore FFT of $153 6^{3}$ grid with 196608 cores of Cray XC40 is comparable to that of GPU-FFT of $204 8^{3}$ grid with 128 GPUs. The efficiency of GPU-FFT is due to the fast computation capabilities of A100 card and efficient communication via NVlink.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manthan-verma/gpu_fft
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Compression Techniques