HFRWKV: A High-Performance Fully On-Chip Hardware Accelerator for RWKV

Liu Shijie; Zeng Zhenghao; Jiao Han; Huang Yihua

arXiv:2601.02135·cs.AR·January 6, 2026

HFRWKV: A High-Performance Fully On-Chip Hardware Accelerator for RWKV

Liu Shijie, Zeng Zhenghao, Jiao Han, Huang Yihua

PDF

Open Access

TL;DR

HFRWKV is a specialized FPGA-based hardware accelerator for RWKV, a modern RNN that balances long-context processing with efficient hardware utilization, achieving significant throughput and energy efficiency improvements.

Contribution

The paper introduces HFRWKV, a fully on-chip FPGA accelerator with novel hybrid-precision quantization and optimized complex operation modules for RWKV.

Findings

01

63.48× throughput improvement over CPU

02

139.17× energy efficiency over CPU

03

32.33× throughput improvement over GPU

Abstract

RWKV is a modern RNN architecture that approaches the performance of Transformers, with the advantage of processing long contexts at a linear memory cost. However, its sequential computation pattern struggles to efficiently leverage GPU parallelism, which leads to low compute resource utilization. Furthermore, frequent off-chip weight accesses create a memory bottleneck. To address these challenges, we propose HFRWKV, an FPGA-based hardware accelerator specifically designed for RWKV. Within the matrix operation module, we propose a novel hardware-friendly hybrid-precision quantization strategy, which enhances performance while maintaining acceptable accuracy. For the complex operations including exponentiation and division, we introduce a method featuring reusable architectures combined with lookup tables or piecewise linear approximation, which is algorithmically refined to effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Packet Processing and Optimization · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems