Orloj: Predictably Serving Unpredictable DNNs
Peifeng Yu, Yuqing Qiu, Xin Jin, Mosharaf Chowdhury

TL;DR
Orloj is a dynamic DNN serving system that effectively manages high variance in request execution times, significantly improving throughput and SLO compliance for dynamic models like NLP and CV, while maintaining competitive static model performance.
Contribution
Orloj introduces a novel approach that uses empirical distributions to schedule dynamic DNN requests without precise execution time knowledge, outperforming existing solutions.
Findings
Outperforms state-of-the-art by 51-80% in finish rate under tight SLOs
Achieves over 100% improvement under relaxed SLOs
Maintains comparable performance on static DNN workloads
Abstract
Existing DNN serving solutions can provide tight latency SLOs while maintaining high throughput via careful scheduling of incoming requests, whose execution times are assumed to be highly predictable and data-independent. However, inference requests to emerging dynamic DNNs -- e.g., popular natural language processing (NLP) models and computer vision (CV) models that skip layers -- are data-dependent. They exhibit poor performance when served using existing solutions because they experience large variance in request execution times depending on the input -- the longest request in a batch inflates the execution times of the smaller ones, causing SLO misses in the absence of careful batching. In this paper, we present Orloj, a dynamic DNN serving system, that captures this variance in dynamic DNNs using empirical distributions of expected request execution times, and then efficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · IoT and Edge/Fog Computing
