TensorFlow-Serving: Flexible, High-Performance ML Serving

Christopher Olston; Noah Fiedel; Kiril Gorovoy; Jeremiah Harmsen; Li; Lao; Fangwei Li; Vinu Rajashekhar; Sukriti Ramesh; Jordan Soyke

arXiv:1712.06139·cs.DC·December 29, 2017·95 cites

TensorFlow-Serving: Flexible, High-Performance ML Serving

Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li, Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, Jordan Soyke

PDF

Open Access

TL;DR

TensorFlow-Serving is a flexible, high-performance system for deploying machine learning models that supports various platforms and integrates seamlessly with training pipelines, optimized for production environments.

Contribution

It introduces a versatile, efficient ML serving system with broad platform support and optimized core inference paths, used extensively within Google.

Findings

01

Supports multiple ML platforms and integration methods

02

Optimized core inference for high performance

03

Deployed in numerous Google production services

Abstract

We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. Google uses it in many production deployments, including a multi-tenant model hosting service called TFS^2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Parallel Computing and Optimization Techniques · Machine Learning and Algorithms