Performance Tuning of a Parallel 3-D FFT Package OpenFFT
Truong Vinh Truong Duy, Taisuke Ozaki

TL;DR
This paper presents performance analysis and auto-tuning of the OpenFFT package, optimizing communication strategies for large-scale 3-D FFT computations across various parallel computing platforms.
Contribution
It introduces six communication methods and an auto-tuning mechanism to enhance OpenFFT's performance on diverse hardware and scales.
Findings
Optimized communication methods improve performance.
Auto-tuning selects best method at runtime.
OpenFFT outperforms some state-of-the-art packages.
Abstract
The fast Fourier transform (FFT) is a primitive kernel in numerous fields of science and engineering. OpenFFT is an open-source parallel package for 3-D FFTs, built on a communication-optimal domain decomposition method for achieving minimal volume of communication. In this paper, we analyze and tune the performance of OpenFFT, paying a particular attention to tuning of communication that dominates the run time of large-scale calculations. We first analyze its performance on different machines for an understanding of the behaviors of the package and machines. Based on the performance analysis, we develop six communication methods for performing communication with the aim of covering varied calculation scales on a variety of computational platforms. OpenFFT is then augmented with an auto-tuning of communication to select the best method in run time depending on their performance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Model Reduction and Neural Networks
