TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
Shixun Wu, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian,, Huangliang Dai, Sheng Di, Zizhong Chen, and Franck Cappello

TL;DR
TurboFFT is a GPU-based FFT implementation that incorporates a two-sided checksum scheme for efficient error detection and correction, maintaining high performance with minimal overhead even under fault conditions.
Contribution
It introduces a novel fault-tolerant FFT design with two-sided checksum encoding, kernel fusion, and template-based code generation, achieving high performance and robustness.
Findings
TurboFFT achieves 23% better performance than existing fault-tolerant FFT schemes.
It incurs only 7-15% overhead compared to cuFFT under error injections.
Demonstrates effective error detection and correction on NVIDIA GPUs.
Abstract
The Fast Fourier Transform (FFT), as a core computation in a wide range of scientific applications, is increasingly threatened by reliability issues. In this paper, we introduce TurboFFT, a high-performance FFT implementation equipped with a two-sided checksum scheme that detects and corrects silent data corruptions at computing units efficiently. The proposed two-sided checksum addresses the error propagation issue by encoding a batch of input signals with different linear combinations, which not only allows fast batched error detection but also enables error correction on-the-fly instead of recomputing. We explore two-sided checksum designs at the kernel, thread, and threadblock levels, and provide a baseline FFT implementation competitive to the state-of-the-art, closed-source cuFFT. We demonstrate a kernel fusion strategy to mitigate and overlap the computation/memory overhead…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotonic and Optical Devices · Spectroscopy Techniques in Biomedical and Chemical Research · Optical Coherence Tomography Applications
