Experiences Porting Distributed Applications to Asynchronous Tasks: A Multidimensional FFT Case-study
Alexander Strack, Christopher Taylor, Patrick Diehl, Dirk, Pfl\"uger

TL;DR
This case study evaluates the effectiveness of asynchronous many-task runtimes for multidimensional FFTs, finding limited benefits for the FFT application but demonstrating competitive performance of HPX-based implementations.
Contribution
The paper provides an empirical analysis of porting FFTW to an asynchronous runtime, highlighting overheads, pitfalls, and performance comparisons with traditional backends.
Findings
Asynchronous execution does not improve FFT performance due to cache effects.
HPX backend is competitive with pthreads, OpenMP, and MPI backends.
HPX's LCI parcelport accelerates communication by up to a factor of 5.
Abstract
Parallel algorithms relying on synchronous parallelization libraries often experience adverse performance due to global synchronization barriers. Asynchronous many-task runtimes offer task futurization capabilities that minimize or remove the need for global synchronization barriers. This paper conducts a case study of the multidimensional Fast Fourier Transform to identify which applications will benefit from the asynchronous many-task model. Our basis is the popular FFTW library. We use the asynchronous many-task model HPX and a one-dimensional FFTW backend to implement multiple versions using different HPX features and highlight overheads and pitfalls during migration. Furthermore, we add an HPX threading backend to FFTW. The case study analyzes shared memory scaling properties between our HPX-based parallelization and FFTW with its pthreads, OpenMP, and HPX backends. The case study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
