TL;DR
This paper introduces a novel MPI-based method for multidimensional FFTs that reduces local data realignment by using discontiguous memory communication, achieving performance comparable or superior to existing libraries.
Contribution
The paper presents a new MPI-2 based approach utilizing subarray datatypes and generalized all-to-all scatter/gather for efficient multidimensional array redistribution in parallel FFTs.
Findings
Performance on par or better than MPI-FFTW, P3DFFT, and 2DECOMP&FFT
Reduces local data realignments in multidimensional FFTs
Applicable to arbitrary array decompositions and processor grids
Abstract
We present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms. Traditional methods use standard all-to-all collective communication of contiguous memory buffers, thus necessary requiring local data realignment steps intermixed in-between redistribution and transform steps. Instead, our method takes advantage of subarray datatypes and generalized all-to-all scatter/gather from the MPI-2 standard to communicate discontiguous memory buffers, effectively eliminating the need for local data realignments. Despite generalized all-to-all communication of discontiguous data being generally slower, our proposal economizes in local work. For a range of strong and weak scaling tests, we found the overall performance of our method to be on par and often better than well-established libraries like MPI-FFTW, P3DFFT,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
