E-BATCH: Energy-Efficient and High-Throughput RNN Batching
Franyell Silfa, Jose Maria Arnau, and Antonio Gonzalez

TL;DR
E-BATCH is a novel batching scheme for RNN inference that significantly improves energy efficiency and throughput by dynamically managing batch composition and size, tailored for RNN accelerators.
Contribution
It introduces a runtime and hardware support for energy-efficient, high-throughput RNN batching that reduces padding and adapts batch size dynamically.
Findings
E-BATCH improves throughput by up to 2.1x.
E-BATCH enhances energy efficiency by up to 3.6x.
It outperforms state-of-the-art methods on E-PUR and TPU.
Abstract
Recurrent Neural Network (RNN) inference exhibits low hardware utilization due to the strict data dependencies across time-steps. Batching multiple requests can increase throughput. However, RNN batching requires a large amount of padding since the batched input sequences may largely differ in length. Schemes that dynamically update the batch every few time-steps avoid padding. However, they require executing different RNN layers in a short timespan, decreasing energy efficiency. Hence, we propose E-BATCH, a low-latency and energy-efficient batching scheme tailored to RNN accelerators. It consists of a runtime system and effective hardware support. The runtime concatenates multiple sequences to create large batches, resulting in substantial energy savings. Furthermore, the accelerator notifies it when the evaluation of a sequence is done, so that a new sequence can be immediately added…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
