FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Daniel Y. Fu, Hermann Kumbong, Eric Nguyen, Christopher R\'e

TL;DR
FlashFFTConv introduces optimized FFT convolution techniques leveraging matrix units and kernel fusion, significantly accelerating long-sequence models and enabling new applications in genomics and vision tasks.
Contribution
The paper presents FlashFFTConv, a novel method that optimizes FFT convolutions using matrix decomposition and sparse algorithms, improving speed and efficiency over existing implementations.
Findings
Up to 7.93× faster FFT convolutions compared to PyTorch.
Achieves up to 4.4× end-to-end speedup in models.
Enables processing of the longest human genes with 2.3 million base pairs.
Abstract
Convolution models with long filters have demonstrated state-of-the-art reasoning abilities in many long-sequence tasks but lag behind the most optimized Transformers in wall-clock time. A major bottleneck is the Fast Fourier Transform (FFT)--which allows long convolutions to run in time in sequence length but has poor hardware utilization. In this paper, we study how to optimize the FFT convolution. We find two key bottlenecks: the FFT does not effectively use specialized matrix multiply units, and it incurs expensive I/O between layers of the memory hierarchy. In response, we propose FlashFFTConv. FlashFFTConv uses a matrix decomposition that computes the FFT using matrix multiply units and enables kernel fusion for long sequences, reducing I/O. We also present two sparse convolution algorithms--1) partial convolutions and 2) frequency-sparse convolutions--which can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Cancer-related molecular mechanisms research · Genomics and Phylogenetic Studies
MethodsBalanced Selection · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution
