Soft GPGPUs for Embedded FPGAs: An Architectural Evaluation
Kevin Andryc, Tedy Thomas, Russell Tessier

TL;DR
This paper introduces a customizable soft GPGPU architecture for embedded FPGAs that enables efficient execution of CUDA code, achieving significant speedups and energy savings without recompilation.
Contribution
It presents a novel FPGA-based soft GPGPU architecture optimized for embedded systems, supporting direct CUDA compilation and scalable multi-processor configurations.
Findings
Average 44x speedup over MicroBlaze
80% average energy savings compared to soft-core processor
Application-specific customization reduces energy by 14%
Abstract
We present a customizable soft architecture which allows for the execution of GPGPU code on an FPGA without the need to recompile the design. Issues related to scaling the overlay architecture to multiple GPGPU multiprocessors are considered along with application-class architectural optimizations. The overlay architecture is optimized for FPGA implementation to support efficient use of embedded block memories and DSP blocks. This architecture supports direct CUDA compilation of integer computations to a binary which is executable on the FPGA-based GPGPU. The benefits of our architecture are evaluated for a collection of five standard CUDA benchmarks which are compiled using standard GPGPU compilation tools. Speedups of 44x, on average, versus a MicroBlaze microprocessor are achieved. We show dynamic energy savings versus a soft-core processor of 80% on average. Application-customized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · VLSI and FPGA Design Techniques
