HG-PIPE: Vision Transformer Acceleration with Hybrid-Grained Pipeline
Qingyu Guo, Jiayong Wan, Songqiang Xu, Meng Li, Yuan Wang

TL;DR
HG-PIPE is a novel FPGA-based Vision Transformer accelerator that employs a hybrid pipeline architecture and approximation techniques to significantly improve throughput and resource efficiency over prior methods.
Contribution
Introduces HG-PIPE, a hybrid-grained pipelined FPGA accelerator for ViT that reduces buffer costs and pipeline bubbles, achieving high throughput and resource efficiency.
Findings
2.78x throughput improvement over prior FPGA accelerators
7118 images/sec end-to-end ViT processing on VCK190 FPGA
2.81x faster than V100 GPU for ViT inference
Abstract
Vision Transformer (ViT) acceleration with field programmable gate array (FPGA) is promising but challenging. Existing FPGA-based ViT accelerators mainly rely on temporal architectures, which process different operators by reusing the same hardware blocks and suffer from extensive memory access overhead. Pipelined architectures, either coarse-grained or fine-grained, unroll the ViT computation spatially for memory access efficiency. However, they usually suffer from significant hardware resource constraints and pipeline bubbles induced by the global computation dependency of ViT. In this paper, we introduce HG-PIPE, a pipelined FPGA accelerator for high-throughput and low-latency ViT processing. HG-PIPE features a hybrid-grained pipeline architecture to reduce on-chip buffer cost and couples the computation dataflow and parallelism design to eliminate the pipeline bubbles. HG-PIPE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Optical Systems and Laser Technology · CCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections
