Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking

Peter Wang; Neelesh Gupta; Viktor Prasanna

arXiv:2601.06065·cs.LG·January 13, 2026

Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking

Peter Wang, Neelesh Gupta, Viktor Prasanna

PDF

Open Access

TL;DR

This paper introduces a chunked FFT convolution method that allows long sequence convolutions on memory-limited FPGAs, achieving high throughput with minimal performance loss by efficient memory management.

Contribution

The paper presents a novel chunking approach for FFT convolutions enabling long sequences on FPGAs with limited memory, demonstrating scalable throughput and minimal degradation.

Findings

01

Throughput scales proportionally with chunk size.

02

Degradation is minimal (7%) for longest sequences.

03

Enables deployment of long-context primitives on edge FPGAs.

Abstract

The need for long-context reasoning has led to alternative neural network architectures besides Transformers and self-attention, a popular model being Hyena, which employs causal 1D-convolutions implemented with FFTs. Long convolutions enable efficient global context mixing, but requirements for intermediate results exceed the 2-3 MB Block RAM capacity of FPGAs. We present a chunked FFT convolution approach enabling 450K length sequence by 450K length filter convolutions on an Alveo U200 FPGA with 2.8 MB BRAM through chunking and overlap-add reconstruction. We find that throughput scales proportionally with chunk size while degrading minimally by 7% for our longest sequences, demonstrating that careful memory management enables deployment of long-context primitives on edge FPGAs without sacrificing performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Filter Design and Implementation · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices