Minimizing communication in the multidimensional FFT
Thomas Koopman, Rob H. Bisseling

TL;DR
This paper introduces a multidimensional parallel FFT algorithm that minimizes communication by requiring only a single all-to-all step, enabling efficient computation on large processor counts for high-dimensional data.
Contribution
The paper generalizes the cyclic-to-cyclic 1D parallel FFT algorithm to higher dimensions, maintaining minimal communication and compatibility with existing local FFT implementations.
Findings
FFTU is competitive with state-of-the-art FFTs.
FFTU achieves significant speedups (149x and 176x) on large arrays.
The algorithm scales well up to 4096 processors.
Abstract
We present a parallel algorithm for the fast Fourier transform (FFT) in higher dimensions. This algorithm generalizes the cyclic-to-cyclic one-dimensional parallel algorithm to a cyclic-to-cyclic multidimensional parallel algorithm while retaining the property of needing only a single all-to-all communication step. This is under the constraint that we use at most processors for an FFT on an array with a total of elements, irrespective of the dimension or the shape of the array. The only assumption we make is that is sufficiently composite. Our algorithm starts and ends in the same data distribution. We present our multidimensional implementation FFTU which utilizes the sequential FFTW program for its local FFTs, and which can handle any dimension . We obtain experimental results for using MPI on up to 4096 cores of the supercomputer Snellius,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Numerical Methods in Computational Mathematics · Advanced Data Storage Technologies
