TL;DR
This paper introduces N3H-Core, a novel FPGA-based neural network accelerator utilizing both DSP and LUT resources, optimized through reinforcement learning for improved latency and accuracy.
Contribution
It presents a heterogeneous FPGA accelerator with DSP and LUT cores, and a systematic framework for optimizing its design using reinforcement learning.
Findings
Latency reduced by 1.12-1.32x compared to state-of-the-art
Achieved higher inference accuracy
Open-sourced at GitHub
Abstract
Accelerating the neural network inference by FPGA has emerged as a popular option, since the reconfigurability and high performance computing capability of FPGA intrinsically satisfies the computation demand of the fast-evolving neural algorithms. However, the popular neural accelerators on FPGA (e.g., Xilinx DPU) mainly utilize the DSP resources for constructing their processing units, while the rich LUT resources are not well exploited. Via the software-hardware co-design approach, in this work, we develop an FPGA-based heterogeneous computing system for neural network acceleration. From the hardware perspective, the proposed accelerator consists of DSP- and LUT-based GEneral Matrix-Multiplication (GEMM) computing cores, which forms the entire computing system in a heterogeneous fashion. The DSP- and LUT-based GEMM cores are computed w.r.t a unified Instruction Set Architecture (ISA)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
