Towards Power Efficient DNN Accelerator Design on Reconfigurable   Platform

Rourab Paul; Sreetama Sarkar; Suman Sau; Koushik Chakraborty,; Sanghamitra Roy; Amlan Chakrabarti

arXiv:2102.06888·cs.AR·February 15, 2022

Towards Power Efficient DNN Accelerator Design on Reconfigurable Platform

Rourab Paul, Sreetama Sarkar, Suman Sau, Koushik Chakraborty,, Sanghamitra Roy, Amlan Chakrabarti

PDF

Open Access

TL;DR

This paper presents an ultra low power FPGA implementation of a TPU for edge applications, using voltage scaling and partitioning strategies to enhance energy efficiency while maintaining performance.

Contribution

It introduces a novel FPGA partitioning and voltage biasing scheme for TPU acceleration, enabling energy savings through static and runtime calibration methods.

Findings

01

Significant power reduction demonstrated in FPGA TPU implementation.

02

Effective partitioning based on slack values improves timing and energy efficiency.

03

Simulation results confirm the viability of voltage scaled TPU in FPGA platforms.

Abstract

The exponential emergence of Field Programmable Gate Array (FPGA) has accelerated the research of hardware implementation of Deep Neural Network (DNN). Among all DNN processors, domain specific architectures, such as, Google's Tensor Processor Unit (TPU) have outperformed conventional GPUs. However, implementation of TPUs in reconfigurable hardware should emphasize energy savings to serve the green computing requirement. Voltage scaling, a popular approach towards energy savings, can be a bit critical in FPGA as it may cause timing failure if not done in an appropriate way. In this work, we present an ultra low power FPGA implementation of a TPU for edge applications. We divide the systolic-array of a TPU into different FPGA partitions, where each partition uses different near threshold (NTC) biasing voltages to run its FPGA cores. The biasing voltage for each partition is roughly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Parallel Computing and Optimization Techniques · Low-power high-performance VLSI design