Implementing FFTs in Practice

Steven G. Johnson; Matteo Frigo

arXiv:2602.23525·math.NA·March 2, 2026

Implementing FFTs in Practice

Steven G. Johnson, Matteo Frigo

PDF

Open Access

TL;DR

This paper reviews engineering considerations for implementing high-performance FFTs, highlighting differences from textbook algorithms and discussing tradeoffs in optimization techniques using FFTW as a case study.

Contribution

It provides a high-level overview of practical engineering strategies for optimizing FFT implementations on modern hardware.

Findings

01

Optimized FFTs differ significantly from textbook algorithms.

02

Tradeoffs in recursion, twiddle factor generation, and code generation are crucial.

03

Case study of FFTW illustrates practical optimization approaches.

Abstract

This review article was first published in 2008 as chapter 11 in the book "Fast Fourier Transforms," edited by C. S. Burrus, for the Connexions project at Rice University, which is sadly no longer online. It gives a high-level overview of some of the engineering considerations that arise in high-performance implementations of fast Fourier trasnforms (FFTs). It explains why optimized FFTs are very different from textbook "radix-2 Cooley-Tukey" FFT algorithms, in order to compensate for the memory hierarchy and exploit the large register sets and deep pipelines of modern CPUs. Using the FFTW library as a case study, it talks about tradeoffs in the use of recursion, generation of twiddle factors, code generation, and other algorithmic choices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Filter Design and Implementation · Parallel Computing and Optimization Techniques · VLSI and Analog Circuit Testing