A Many-Core Overlay for High-Performance Embedded Computing on FPGAs
M\'ario V\'estias, Hor\'acio Neto

TL;DR
This paper introduces a configurable many-core overlay architecture for FPGAs that simplifies hardware design while maintaining high performance for embedded computing tasks like matrix multiplication, LU decomposition, and FFT.
Contribution
It presents a flexible, configurable many-core overlay that reduces hardware complexity and is adaptable to various computational kernels on FPGA platforms.
Findings
Achieves high performance in matrix multiplication, LU, and FFT
Reduces hardware design complexity
Demonstrates flexibility and configurability of the overlay
Abstract
In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Photonic and Optical Devices · VLSI and FPGA Design Techniques
