Orloj: Predictably Serving Unpredictable DNNs

Peifeng Yu; Yuqing Qiu; Xin Jin; Mosharaf Chowdhury

arXiv:2209.00159·cs.DC·September 2, 2022

Orloj: Predictably Serving Unpredictable DNNs

Peifeng Yu, Yuqing Qiu, Xin Jin, Mosharaf Chowdhury

PDF

Open Access

TL;DR

Orloj is a dynamic DNN serving system that effectively manages high variance in request execution times, significantly improving throughput and SLO compliance for dynamic models like NLP and CV, while maintaining competitive static model performance.

Contribution

Orloj introduces a novel approach that uses empirical distributions to schedule dynamic DNN requests without precise execution time knowledge, outperforming existing solutions.

Findings

01

Outperforms state-of-the-art by 51-80% in finish rate under tight SLOs

02

Achieves over 100% improvement under relaxed SLOs

03

Maintains comparable performance on static DNN workloads

Abstract

Existing DNN serving solutions can provide tight latency SLOs while maintaining high throughput via careful scheduling of incoming requests, whose execution times are assumed to be highly predictable and data-independent. However, inference requests to emerging dynamic DNNs -- e.g., popular natural language processing (NLP) models and computer vision (CV) models that skip layers -- are data-dependent. They exhibit poor performance when served using existing solutions because they experience large variance in request execution times depending on the input -- the longest request in a batch inflates the execution times of the smaller ones, causing SLO misses in the absence of careful batching. In this paper, we present Orloj, a dynamic DNN serving system, that captures this variance in dynamic DNNs using empirical distributions of expected request execution times, and then efficiently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · IoT and Edge/Fog Computing