tcFFT: Accelerating Half-Precision FFT through Tensor Cores

Binrui Li; Shenggan Cheng; James Lin

arXiv:2104.11471·cs.DC·April 26, 2021·6 cites

tcFFT: Accelerating Half-Precision FFT through Tensor Cores

Binrui Li, Shenggan Cheng, James Lin

PDF

Open Access

TL;DR

This paper introduces tcFFT, a GPU-accelerated FFT implementation utilizing Tensor Cores for half-precision computations, achieving significant speedups over NVIDIA's cuFFT in various scenarios.

Contribution

We developed tcFFT, a novel FFT acceleration method that leverages Tensor Cores with specific optimizations for mixed-precision FFT computations.

Findings

01

tcFFT outperforms cuFFT by up to 3.24x on V100 GPUs.

02

Supports batched 1D and 2D FFTs of various sizes.

03

Achieves high performance through specialized fragment manipulation and data arrangement.

Abstract

Fast Fourier Transform (FFT) is an essential tool in scientific and engineering computation. The increasing demand for mixed-precision FFT has made it possible to utilize half-precision floating-point (FP16) arithmetic for faster speed and energy saving. Specializing in lower precision, NVIDIA Tensor Cores can deliver extremely high computation performance. However, the fixed computation pattern makes it hard to utilize the computing power of Tensor Cores in FFT. Therefore, we developed tcFFT to accelerate FFT with Tensor Cores. Our tcFFT supports batched 1D and 2D FFT of various sizes and it exploits a set of optimizations to achieve high performance: 1) single-element manipulation on Tensor Core fragments to support special operations needed by FFT; 2) fine-grained data arrangement design to coordinate with the GPU memory access pattern. We evaluated our tcFFT and the NVIDIA cuFFT in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Numerical Methods and Algorithms · Advanced Data Storage Technologies