EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge

Jiahe Cao; Xiaomeng Li; Qiang Liu; Tao Han; Ning Zhang; Weisong Shi

arXiv:2605.05527·cs.DC·May 8, 2026

EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge

Jiahe Cao, Xiaomeng Li, Qiang Liu, Tao Han, Ning Zhang, Weisong Shi

PDF

TL;DR

EdgeServing is a deadline-aware system for multi-DNN serving at the edge that optimizes GPU sharing and scheduling to improve latency predictability and reduce SLO violations.

Contribution

It introduces a novel deadline-aware scheduling approach with early-exit inference and a stability score for better multi-DNN GPU sharing at the edge.

Findings

01

Outperforms baselines in SLO violation ratio and P95 latency.

02

Uses early-exit inference to expand scheduling options under latency constraints.

03

Achieves consistent improvements across multiple hardware platforms.

Abstract

As edge computing expands, serving multiple deep neural network (DNN) models on a single shared GPU has become a common yet challenging scenario, where each scheduling decision affects the tail latency of all concurrent queues. Existing schedulers rely on local heuristics and fail to capture this global impact, while GPU spatial-sharing approaches sacrifice latency predictability. In this paper, we propose EdgeServing, a deadline-aware multi-DNN serving system for edge devices. EdgeServing adopts time-division GPU sharing with early-exit inference for high inference predictability, and introduces a stability score to quantify how each candidate scheduling decision impacts the future queue status. At runtime, it cohesively selects the model, exit point, and batch size to minimize predicted system-wide SLO impact. Experimental results on multiple hardware platforms show that EdgeServing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.