TensorFlow-Serving: Flexible, High-Performance ML Serving
Christopher Olston, Noah Fiedel, Kiril Gorovoy, Jeremiah Harmsen, Li, Lao, Fangwei Li, Vinu Rajashekhar, Sukriti Ramesh, Jordan Soyke

TL;DR
TensorFlow-Serving is a flexible, high-performance system for deploying machine learning models that supports various platforms and integrates seamlessly with training pipelines, optimized for production environments.
Contribution
It introduces a versatile, efficient ML serving system with broad platform support and optimized core inference paths, used extensively within Google.
Findings
Supports multiple ML platforms and integration methods
Optimized core inference for high performance
Deployed in numerous Google production services
Abstract
We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. Google uses it in many production deployments, including a multi-tenant model hosting service called TFS^2.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Parallel Computing and Optimization Techniques · Machine Learning and Algorithms
