Soft GPGPU versus IP cores: Quantifying and Reducing the Performance Gap
Martin Langhammer (1, 2), George A. Constantinides (2) ((1) Intel, Corporation, (2) Imperial College London)

TL;DR
This paper evaluates the performance gap between soft GPGPUs and specialized IP cores on FFT tasks, proposing architectural improvements that significantly enhance soft GPU efficiency, making them competitive for FPGA-based digital signal processing.
Contribution
The paper quantifies the performance gap and introduces two novel architectural features for eGPU that improve FFT efficiency by 50%, reducing the performance-area gap with specialized IP cores.
Findings
eGPU achieves high clock frequencies (>750 MHz) and small footprint.
Modified eGPU reduces the performance-area gap to 3x that of specialized IP cores.
eGPU demonstrates superior efficiency compared to Nvidia A100 GPGPUs for FFTs.
Abstract
eGPU, a recently-reported soft GPGPU for FPGAs, has demonstrated very high clock frequencies (more than 750 MHz) and small footprint. This means that for the first time, commercial soft processors may be competitive for the kind of heavy numerical computations common in FPGA-based digital signal processing. In this paper we take a deep dive into the performance of the eGPU family on FFT computation, in order to quantify the performance gap between state-of-the-art soft processors and commercial IP cores specialized for this task. In the process, we propose two novel architectural features for the eGPU that improve the efficiency of the design by 50\% when executing the FFTs. The end-result is that our modified GPGPU takes only 3 times the performance-area product of a specialized IP core, yet as a programmable processor is able to execute arbitrary software-defined algorithms. Further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Network Packet Processing and Optimization · Algorithms and Data Compression
