Clipper: A Low-Latency Online Prediction Serving System

Daniel Crankshaw; Xin Wang; Giulio Zhou; Michael J. Franklin; Joseph; E. Gonzalez; Ion Stoica

arXiv:1612.03079·cs.DC·March 1, 2017·81 cites

Clipper: A Low-Latency Online Prediction Serving System

Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph, E. Gonzalez, Ion Stoica

PDF

Open Access

TL;DR

Clipper is a versatile, low-latency prediction serving system that enhances real-time machine learning deployment by integrating caching, batching, and adaptive model selection, without modifying underlying frameworks.

Contribution

Introduces Clipper, a modular prediction serving system that improves latency, throughput, and robustness across various frameworks without requiring changes to models.

Findings

01

Achieves low-latency predictions on benchmark datasets.

02

Demonstrates comparable performance to TensorFlow Serving.

03

Enables model composition and online learning for improved accuracy.

Abstract

Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment. In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Caching and Content Delivery · Cloud Computing and Resource Management