A Persistent-State Dataflow Accelerator for Memory-Bound Linear Attention Decode on FPGA
Neelesh Gupta, Peter Wang, Rajgopal Kannan, Viktor K. Prasanna

TL;DR
This paper introduces an FPGA-based accelerator that significantly speeds up memory-bound linear attention decoding by persistently storing recurrent states on-chip, achieving higher efficiency than GPUs.
Contribution
It presents a novel FPGA architecture that transforms memory-bound linear attention decoding into a compute-bound process by on-chip persistent state storage and pipelined dataflow design.
Findings
Achieves 63 microseconds per token decoding time.
Provides 4.5 times faster decoding than NVIDIA H100 GPU.
Yields up to 60 times greater energy efficiency.
Abstract
Gated DeltaNet (GDN) is a linear attention mechanism that replaces the growing KV cache with a fixed-size recurrent state. Hybrid LLMs like Qwen3-Next use 75% GDN layers and achieve competitive accuracy to attention-only models. However, at batch-1, GDN decode is memory-bound on GPUs since the full recurrent state must be round-tripped through HBM every token. We show that this bottleneck is architectural, not algorithmic, as all subquadratic sequence models exhibit arithmetic intensities below 1 FLOP/B at decode time, making them more memory-bound than standard Transformers. We present an FPGA accelerator that eliminates this bottleneck by holding the full 2 MB recurrent state persistently in on-chip BRAM, converting the workload from memory-bound to compute-bound. Our design fuses the GDN recurrence into a five-phase pipelined datapath that performs only one read and one write pass…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy
