ProTEA: Programmable Transformer Encoder Acceleration on FPGA

Ehsan Kabir; Jason D. Bakos; David Andrews; Miaoqing Huang

arXiv:2409.13975·cs.AR·September 24, 2024

ProTEA: Programmable Transformer Encoder Acceleration on FPGA

Ehsan Kabir, Jason D. Bakos, David Andrews, Miaoqing Huang

PDF

Open Access

TL;DR

ProTEA is a programmable FPGA-based accelerator designed for transformer encoder models, achieving significant speedups over GPUs and existing FPGA solutions by optimizing parallelism and matrix tiling.

Contribution

This paper presents ProTEA, a flexible, runtime-programmable FPGA accelerator specifically optimized for dense transformer encoder computations, with novel tiling strategies for improved performance.

Findings

01

ProTEA achieves 2.5× faster inference than NVIDIA Titan XP.

02

ProTEA outperforms current FPGA accelerators by 1.3–2.8×.

03

ProTEA supports a wide range of transformer models with near-optimal performance.

Abstract

Transformer neural networks (TNN) have been widely utilized on a diverse range of applications, including natural language processing (NLP), machine translation, and computer vision (CV). Their widespread adoption has been primarily driven by the exceptional performance of their multi-head self-attention block used to extract key features from sequential data. The multi-head self-attention block is followed by feedforward neural networks, which play a crucial role in introducing non-linearity to assist the model in learning complex patterns. Despite the popularity of TNNs, there has been limited numbers of hardware accelerators targeting these two critical blocks. Most prior works have concentrated on sparse architectures that are not flexible for popular TNN variants. This paper introduces \textit{ProTEA}, a runtime programmable accelerator tailored for the dense computations of most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnalog and Mixed-Signal Circuit Design · Embedded Systems Design Techniques · Digital Filter Design and Implementation

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings