NPE: An FPGA-based Overlay Processor for Natural Language Processing
Hamza Khan, Asma Khan, Zainab Khan, Lun Bin Huang, Kun Wang, Lei He

TL;DR
NPE is an FPGA-based overlay processor designed for NLP models like BERT, offering software-like programmability, real-time performance, and significant power and resource efficiency for edge applications.
Contribution
It introduces NPE, a flexible FPGA overlay that efficiently executes NLP models, enabling upgrades without reconfiguration and achieving superior power and resource efficiency.
Findings
NPE meets real-time latency targets for BERT.
NPE uses 4x less power than CPUs and 6x less than GPUs.
NPE uses 3x fewer FPGA resources than comparable accelerators.
Abstract
In recent years, transformer-based models have shown state-of-the-art results for Natural Language Processing (NLP). In particular, the introduction of the BERT language model brought with it breakthroughs in tasks such as question answering and natural language inference, advancing applications that allow humans to interact naturally with embedded devices. FPGA-based overlay processors have been shown as effective solutions for edge image and video processing applications, which mostly rely on low precision linear matrix operations. In contrast, transformer-based NLP techniques employ a variety of higher precision nonlinear operations with significantly higher frequency. We present NPE, an FPGA-based overlay processor that can efficiently execute a variety of NLP models. NPE offers software-like programmability to the end user and, unlike FPGA designs that implement specialized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
