HFRWKV: A High-Performance Fully On-Chip Hardware Accelerator for RWKV
Liu Shijie, Zeng Zhenghao, Jiao Han, Huang Yihua

TL;DR
HFRWKV is a specialized FPGA-based hardware accelerator for RWKV, a modern RNN that balances long-context processing with efficient hardware utilization, achieving significant throughput and energy efficiency improvements.
Contribution
The paper introduces HFRWKV, a fully on-chip FPGA accelerator with novel hybrid-precision quantization and optimized complex operation modules for RWKV.
Findings
63.48× throughput improvement over CPU
139.17× energy efficiency over CPU
32.33× throughput improvement over GPU
Abstract
RWKV is a modern RNN architecture that approaches the performance of Transformers, with the advantage of processing long contexts at a linear memory cost. However, its sequential computation pattern struggles to efficiently leverage GPU parallelism, which leads to low compute resource utilization. Furthermore, frequent off-chip weight accesses create a memory bottleneck. To address these challenges, we propose HFRWKV, an FPGA-based hardware accelerator specifically designed for RWKV. Within the matrix operation module, we propose a novel hardware-friendly hybrid-precision quantization strategy, which enhances performance while maintaining acceptable accuracy. For the complex operations including exponentiation and division, we introduce a method featuring reusable architectures combined with lookup tables or piecewise linear approximation, which is algorithmically refined to effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Packet Processing and Optimization · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems
