A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform
Wenzhi Fu, Jianlei Yang, Pengcheng Dai, Yiran Chen, Weisheng Zhao

TL;DR
This paper presents a scalable pipelined dataflow FPGA accelerator for object region proposals, significantly improving speed and energy efficiency over traditional CPU and embedded platforms.
Contribution
It introduces a novel FPGA-based dataflow architecture with pipelined stages and tiered memory for efficient region proposal processing.
Findings
Achieves 3.67x speedup over desktop CPU
Over 250x energy efficiency improvement over embedded ARM
Effective pipeline design for real-time object detection
Abstract
Region proposal is critical for object detection while it usually poses a bottleneck in improving the computation efficiency on traditional control-flow architectures. We have observed region proposal tasks are potentially suitable for performing pipelined parallelism by exploiting dataflow driven acceleration. In this paper, a scalable pipelined dataflow accelerator is proposed for efficient region proposals on FPGA platform. The accelerator processes image data by a streaming manner with three sequential stages: resizing, kernel computing and sorting. First, Ping-Pong cache strategy is adopted for rotation loading in resize module to guarantee continuous output streaming. Then, a multiple pipelines architecture with tiered memory is utilized in kernel computing module to complete the main computation tasks. Finally, a bubble-pushing heap sort method is exploited in sorting module to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · CCD and CMOS Imaging Sensors
