MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive   Multi-Accelerator Systems

Guan Shen; Jieru Zhao; Zeke Wang; Zhe Lin; Wenchao Ding; Chentao Wu,; Quan Chen; Minyi Guo

arXiv:2307.12234·cs.DC·July 25, 2023

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Guan Shen, Jieru Zhao, Zeke Wang, Zhe Lin, Wenchao Ding, Chentao Wu,, Quan Chen, Minyi Guo

PDF

Open Access

TL;DR

MARS is a framework that enhances DNN performance on multi-accelerator systems by optimizing accelerator selection and communication-aware sharding, significantly reducing latency.

Contribution

It introduces a novel mapping framework combining computation-aware accelerator selection with communication-aware sharding strategies for DNN workloads.

Findings

01

Achieves 32.2% average latency reduction on typical DNNs.

02

Reduces latency by 59.4% on heterogeneous models.

03

Outperforms existing methods in multi-accelerator DNN mapping.

Abstract

Along with the fast evolution of deep neural networks, the hardware system is also developing rapidly. As a promising solution achieving high scalability and low manufacturing cost, multi-accelerator systems widely exist in data centers, cloud platforms, and SoCs. Thus, a challenging problem arises in multi-accelerator systems: selecting a proper combination of accelerators from available designs and searching for efficient DNN mapping strategies. To this end, we propose MARS, a novel mapping framework that can perform computation-aware accelerator selection, and apply communication-aware sharding strategies to maximize parallelism. Experimental results show that MARS can achieve 32.2% latency reduction on average for typical DNN workloads compared to the baseline, and 59.4% latency reduction on heterogeneous models compared to the corresponding state-of-the-art method.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Brain Tumor Detection and Classification