Implementing FFTs in Practice
Steven G. Johnson, Matteo Frigo

TL;DR
This paper reviews engineering considerations for implementing high-performance FFTs, highlighting differences from textbook algorithms and discussing tradeoffs in optimization techniques using FFTW as a case study.
Contribution
It provides a high-level overview of practical engineering strategies for optimizing FFT implementations on modern hardware.
Findings
Optimized FFTs differ significantly from textbook algorithms.
Tradeoffs in recursion, twiddle factor generation, and code generation are crucial.
Case study of FFTW illustrates practical optimization approaches.
Abstract
This review article was first published in 2008 as chapter 11 in the book "Fast Fourier Transforms," edited by C. S. Burrus, for the Connexions project at Rice University, which is sadly no longer online. It gives a high-level overview of some of the engineering considerations that arise in high-performance implementations of fast Fourier trasnforms (FFTs). It explains why optimized FFTs are very different from textbook "radix-2 Cooley-Tukey" FFT algorithms, in order to compensate for the memory hierarchy and exploit the large register sets and deep pipelines of modern CPUs. Using the FFTW library as a case study, it talks about tradeoffs in the use of recursion, generation of twiddle factors, code generation, and other algorithmic choices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Filter Design and Implementation · Parallel Computing and Optimization Techniques · VLSI and Analog Circuit Testing
