Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
Arpan Gujarati, Reza Karimi, Safya Alzayat, Wei Hao, Antoine Kaufmann,, Ymir Vigfusson, Jonathan Mace

TL;DR
This paper introduces Clockwork, a distributed DNN model serving system that leverages the deterministic nature of inference times to achieve highly predictable low-latency performance at scale.
Contribution
It presents a novel design methodology and system implementation that ensures end-to-end latency predictability for DNN inference serving.
Findings
Supports thousands of models with 100ms latency for 99.9999% requests
Achieves tight request-level SLOs and high performance isolation
Demonstrates predictable inference times enable reliable performance guarantees
Abstract
Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable - on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Advanced Memory and Neural Computing · EEG and Brain-Computer Interfaces
