Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

Hyoukjun Kwon; Liangzhen Lai; Michael Pellauer; Tushar Krishna,; Yu-Hsin Chen; Vikas Chandra

arXiv:1909.07437·cs.DC·December 18, 2020

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

Hyoukjun Kwon, Liangzhen Lai, Michael Pellauer, Tushar Krishna,, Yu-Hsin Chen, Vikas Chandra

PDF

TL;DR

This paper introduces heterogeneous dataflow accelerators (HDAs) with multiple sub-accelerators for diverse DNN workloads, offering a balance of flexibility, energy efficiency, and area cost, outperforming fixed and reconfigurable accelerators.

Contribution

It proposes HDAs supporting multiple dataflows, along with Herald for co-optimizing hardware partitioning and scheduling, demonstrating significant performance and energy benefits.

Findings

01

HDA architecture reduces latency by 65.3% compared to fixed dataflow accelerators.

02

HDA achieves 5.0% lower energy consumption than fixed dataflow accelerators.

03

Maelstrom, an HDA design, is 22.0% more energy-efficient than state-of-the-art reconfigurable accelerators.

Abstract

Emerging AI-enabled applications such as augmented/virtual reality (AR/VR) leverage multiple deep neural network (DNN) models for sub-tasks such as object detection, hand tracking, and so on. Because of the diversity of the sub-tasks, the layers within and across the DNN models are highly heterogeneous in operation and shape. Such layer heterogeneity is a challenge for a fixed dataflow accelerator (FDA) that employs a fixed dataflow on a single accelerator substrate since each layer prefers different dataflows (computation order and parallelization) and tile sizes. Reconfigurable DNN accelerators (RDAs) have been proposed to adapt their dataflows to diverse layers to address the challenge. However, the dataflow flexibility in RDAs is enabled at the area and energy costs of expensive hardware structures (switches, controller, etc.) and per-layer reconfiguration. Alternatively, this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.