Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

Cheng Liu; Ho-Cheung Ng; Hayden Kwok-Hay So

arXiv:1509.00042·cs.AR·September 2, 2015·20 cites

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

Cheng Liu, Ho-Cheung Ng, Hayden Kwok-Hay So

PDF

Open Access

TL;DR

This paper presents an automatic framework for customizing FPGA-based soft CGRA overlays to accelerate nested loops, significantly improving performance and productivity compared to non-customized solutions.

Contribution

It introduces an automated method for optimizing overlay architecture and compilation parameters for nested loop acceleration on FPGAs, enhancing performance with minimal additional runtime.

Findings

01

Up to 5x speedup over non-customized accelerators.

02

Up to 10x speedup compared to host-only software execution.

03

Customization process takes 10-20 minutes.

Abstract

Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains. To construct such accelerators with high design productivity, researchers have increasingly turned to the use of overlay architectures as an intermediate generation target built on top of off-the-shelf FPGAs. However, achieving the desired performance-overhead trade-off remains a major productivity challenge as complex application-specific customizations over a large design space covering multiple architectural parameters are needed. In this work, an automatic nested loop acceleration framework utilizing a regular soft coarse-grained reconfigurable array (SCGRA) overlay is presented. Given high-level resource constraints, the framework automatically customizes the overlay…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Interconnection Networks and Systems · Parallel Computing and Optimization Techniques