Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking
Peter Wang, Neelesh Gupta, Viktor Prasanna

TL;DR
This paper introduces a chunked FFT convolution method that allows long sequence convolutions on memory-limited FPGAs, achieving high throughput with minimal performance loss by efficient memory management.
Contribution
The paper presents a novel chunking approach for FFT convolutions enabling long sequences on FPGAs with limited memory, demonstrating scalable throughput and minimal degradation.
Findings
Throughput scales proportionally with chunk size.
Degradation is minimal (7%) for longest sequences.
Enables deployment of long-context primitives on edge FPGAs.
Abstract
The need for long-context reasoning has led to alternative neural network architectures besides Transformers and self-attention, a popular model being Hyena, which employs causal 1D-convolutions implemented with FFTs. Long convolutions enable efficient global context mixing, but requirements for intermediate results exceed the 2-3 MB Block RAM capacity of FPGAs. We present a chunked FFT convolution approach enabling 450K length sequence by 450K length filter convolutions on an Alveo U200 FPGA with 2.8 MB BRAM through chunking and overlap-add reconstruction. We find that throughput scales proportionally with chunk size while degrading minimally by 7% for our longest sequences, demonstrating that careful memory management enables deployment of long-context primitives on edge FPGAs without sacrificing performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Filter Design and Implementation · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
