Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on   Edge GPU

Zhihe Zhao; Neiwen Ling; Nan Guan; Guoliang Xing

arXiv:2307.04339·cs.DC·July 11, 2023·2 cites

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

Zhihe Zhao, Neiwen Ling, Nan Guan, Guoliang Xing

PDF

Open Access

TL;DR

Miriam is a framework that enables efficient, real-time multi-DNN inference on resource-constrained edge GPUs by dynamically managing kernel execution to meet diverse performance requirements.

Contribution

It introduces a novel contention-aware task coordination framework with elastic kernels and dynamic scheduling for multi-DNN inference on edge GPUs.

Findings

01

System throughput increased by 92% with Miriam.

02

Critical task latency overhead remains below 10%.

03

Effective resource management on edge GPUs demonstrated.

Abstract

Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardware-level resource management mechanisms for avoiding resource contention. Therefore, we propose Miriam, a contention-aware task coordination framework for multi-DNN inference on edge GPU. Miriam consolidates two main components, an elastic-kernel generator, and a runtime dynamic kernel coordinator, to support mixed critical DNN inference. To evaluate Miriam, we build a new DNN inference benchmark based on CUDA with diverse representative DNN workloads. Experiments on two edge GPU platforms show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · IoT and Edge/Fog Computing