Inner Loop Optimizations in Mapping Single Threaded Programs to Hardware
Madhav Desai

TL;DR
This paper introduces a dynamic loop-pipelining control mechanism for hardware implementation of single-threaded programs, demonstrating significant performance improvements when combined with loop unrolling on FPGA targets.
Contribution
It presents a novel control-flow mechanism for dynamic loop-pipelining in hardware, enhancing performance of inner loops in single-threaded programs, especially when combined with static loop unrolling.
Findings
Dynamic loop-pipelining improves hardware performance significantly.
Combining loop unrolling with pipelining yields 6X to 20X speedup.
Performance/cost ratio is improved despite hardware overhead.
Abstract
In the context of mapping high-level algorithms to hardware, we consider the basic problem of generating an efficient hardware implementation of a single threaded program, in particular, that of an inner loop. We describe a control-flow mechanism which provides dynamic loop-pipelining capability in hardware, so that multiple iterations of an arbitrary inner loop can be made simultaneously active in the generated hardware, We study the impact of this loop-pipelining scheme in conjunction with source-level loop-unrolling. In particular, we apply this technique to some common loop kernels: regular kernels such as the fast-fourier transform and matrix multiplication, as well as an example of an inner loop whose body has branching. The resulting resulting hardware descriptions are synthesized to an FPGA target, and then characterized for performance and resource utilization. We observe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Low-power high-performance VLSI design
