E-BATCH: Energy-Efficient and High-Throughput RNN Batching

Franyell Silfa; Jose Maria Arnau; and Antonio Gonzalez

arXiv:2009.10656·cs.DC·September 23, 2020

E-BATCH: Energy-Efficient and High-Throughput RNN Batching

Franyell Silfa, Jose Maria Arnau, and Antonio Gonzalez

PDF

TL;DR

E-BATCH is a novel batching scheme for RNN inference that significantly improves energy efficiency and throughput by dynamically managing batch composition and size, tailored for RNN accelerators.

Contribution

It introduces a runtime and hardware support for energy-efficient, high-throughput RNN batching that reduces padding and adapts batch size dynamically.

Findings

01

E-BATCH improves throughput by up to 2.1x.

02

E-BATCH enhances energy efficiency by up to 3.6x.

03

It outperforms state-of-the-art methods on E-PUR and TPU.

Abstract

Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short timespan, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of a sequence is done, so that a new sequence can be immediately added…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.