Shared Memory-contention-aware Concurrent DNN Execution for Diversely   Heterogeneous System-on-Chips

Ismet Dagli; Mehmet Belviranli

arXiv:2308.05869·cs.DC·February 8, 2024

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

Ismet Dagli, Mehmet Belviranli

PDF

Open Access 1 Repo

TL;DR

This paper introduces HaX-CoNN, a scheduling scheme for concurrent DNN inference on heterogeneous SoCs that optimizes performance by considering layer characteristics, shared memory contention, and inter-accelerator transitions.

Contribution

HaX-CoNN is a novel approach that effectively characterizes and maps DNN layers to accelerators, reducing memory contention and improving throughput and latency.

Findings

01

Reduces shared memory contention by up to 45%.

02

Improves latency by up to 32%.

03

Enhances total throughput by up to 29%.

Abstract

Two distinguishing features of state-of-the-art mobile and autonomous systems are 1) there are often multiple workloads, mainly deep neural network (DNN) inference, running concurrently and continuously; and 2) they operate on shared memory system-on-chips (SoC) that embed heterogeneous accelerators tailored for specific operations. State-of-the-art lacks efficient performance and resource management techniques necessary to either maximize total system throughput or minimize end-to-end workload latency. In this work, we propose HaX-CoNN, a novel scheme that characterizes and maps layers in concurrently executing DNN inference workloads to a diverse set of accelerators within a SoC. Our scheme uniquely takes per-layer execution characteristics, shared memory (SM) contention, and inter-accelerator transitions into account to find optimal schedules. We evaluate HaX-CoNN on NVIDIA Orin,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ismetdagli/hax-conn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques