Intel Cilk Plus for Complex Parallel Algorithms: "Enormous Fast Fourier Transform" (EFFT) Library
Ryo Asai, Andrey Vladimirov

TL;DR
This paper introduces the EFFT library, a highly efficient parallel 1D DFT implementation on multi-core Intel Xeon processors, leveraging Intel Cilk Plus for nested parallelism without tuning for specific hardware details.
Contribution
It presents a novel parallel DFFT library that outperforms existing libraries like MKL and FFTW, demonstrating effective use of Cilk Plus for complex recursive parallel algorithms.
Findings
EFFT achieves up to 1.5x speedup over MKL
EFFT is up to 2.5x faster than FFTW
The approach simplifies parallelization without hardware tuning
Abstract
In this paper we demonstrate the methodology for parallelizing the computation of large one-dimensional discrete fast Fourier transforms (DFFTs) on multi-core Intel Xeon processors. DFFTs based on the recursive Cooley-Tukey method have to control cache utilization, memory bandwidth and vector hardware usage, and at the same time scale across multiple threads or compute nodes. Our method builds on single-threaded Intel Math Kernel Library (MKL) implementation of DFFT, and uses the Intel Cilk Plus framework for thread parallelism. We demonstrate the ability of Intel Cilk Plus to handle parallel recursion with nested loop-centric parallelism without tuning the code to the number of cores or cache metrics. The result of our work is a library called EFFT that performs 1D DFTs of size 2^N for N>=21 faster than the corresponding Intel MKL parallel DFT implementation by up to 1.5x, and faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
