QoS-Aware Placement of Deep Learning Services on the Edge with Multiple   Service Implementations

Nathaniel Hudson; Hana Khamfroush; Daniel E. Lucani

arXiv:2104.15094·cs.NI·May 3, 2021

QoS-Aware Placement of Deep Learning Services on the Edge with Multiple Service Implementations

Nathaniel Hudson, Hana Khamfroush, Daniel E. Lucani

PDF

TL;DR

This paper addresses the challenge of optimally placing and scheduling multiple deep learning service implementations on edge devices to maximize user QoS, proposing efficient greedy algorithms with proven approximation guarantees.

Contribution

It formulates the joint placement and scheduling problem as an NP-hard integer program, proves its submodular structure, and introduces two greedy algorithms with theoretical and empirical performance guarantees.

Findings

01

The greedy algorithms achieve near-optimal solutions with over 90% accuracy.

02

The proposed methods outperform baseline approaches in synthetic and real-world scenarios.

03

Empirical results demonstrate effective QoS maximization in edge intelligence services.

Abstract

Mobile edge computing pushes computationally-intensive services closer to the user to provide reduced delay due to physical proximity. This has led many to consider deploying deep learning models on the edge -- commonly known as edge intelligence (EI). EI services can have many model implementations that provide different QoS. For instance, one model can perform inference faster than another (thus reducing latency) while achieving less accuracy when evaluated. In this paper, we study joint service placement and model scheduling of EI services with the goal to maximize Quality-of-Servcice (QoS) for end users where EI services have multiple implementations to serve user requests, each with varying costs and QoS benefits. We cast the problem as an integer linear program and prove that it is NP-hard. We then prove the objective is equivalent to maximizing a monotone increasing, submodular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.