Computing FFTs at Target Precision Using Lower-Precision FFTs
Shota Kawakami, Daisuke Takahashi

TL;DR
This paper introduces a novel method to compute high-precision FFTs efficiently by leveraging lower-precision FFTs and number theoretic transforms, reducing error and maintaining stability.
Contribution
It adapts the Ozaki scheme to FFTs using NTTs and the Chinese remainder theorem, enabling target-precision FFTs with fewer NTT calls and improved accuracy.
Findings
Confirmed reduced relative error compared to FFTW and Triple-Single precision.
Achieved stable error across FFT lengths with at most 96 NTT calls.
Execution time is 107-1315 times that of FFTW's double-precision FFT.
Abstract
Modern processors deliver higher throughput for lower-precision arithmetic than for higher-precision arithmetic. For matrix multiplication, the Ozaki scheme exploits this performance gap by splitting the inputs into lower-precision components and delegating the computation to optimized lower-precision routines. However, no similar approach exists for the fast Fourier transform (FFT). Here, we propose a method that computes target-precision FFTs using lower-precision FFTs by applying the Ozaki scheme to the cyclic convolution in the Bluestein FFT. The split component convolutions are computed exactly using the number theoretic transform (NTT), an FFT over a finite field, instead of floating-point FFTs, combined with the Chinese remainder theorem. We introduce an upper bound on the number of splits and an NTT-domain accumulation strategy to reduce the NTT call count. As a concrete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
