Scalable Light-Weight Integration of FPGA Based Accelerators with Chip Multi-Processors
Zhe Lin, Sharad Sinha, Hao Liang, Liang Feng, Wei Zhang

TL;DR
This paper presents a scalable, lightweight architecture for integrating FPGA-based accelerators with chip multiprocessors, enhancing performance and scalability in heterogeneous multicore systems.
Contribution
It introduces an architectural support with distributed packet receivers, hierarchical packet senders, and an accelerator chaining mechanism for efficient FPGA-processor integration.
Findings
High performance demonstrated through FPGA prototyping
Architecture is scalable and lightweight
Effective intra-FPGA data reuse reduces communication overhead
Abstract
Modern multicore systems are migrating from homogeneous systems to heterogeneous systems with accelerator-based computing in order to overcome the barriers of performance and power walls. In this trend, FPGA-based accelerators are becoming increasingly attractive, due to their excellent flexibility and low design cost. In this paper, we propose the architectural support for efficient interfacing between FPGA-based multi-accelerators and chip-multiprocessors (CMPs) connected through the network-on-chip (NoC). Distributed packet receivers and hierarchical packet senders are designed to maintain scalability and reduce the critical path delay under a heavy task load. A dedicated accelerator chaining mechanism is also proposed to facilitate intra-FPGA data reuse among accelerators to circumvent prohibitive communication overhead between the FPGA and processors. In order to evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
