Towards ML Engineering: A Brief History Of TensorFlow Extended (TFX)
Konstantinos (Gus) Katsiapis, Abhijit Karmarkar, Ahmet Altay,, Aleksandr Zaks, Neoklis Polyzotis, Anusha Ramesh, Ben Mathes, Gautam, Vasudevan, Irene Giannoumis, Jarek Wilkiewicz, Jiri Simsa, Justin Hong, Mitch, Trott, No\'e Lutz, Pavel A. Dournov, Robert Crowe, Sarah Sirajuddin

TL;DR
This paper reviews the evolution of ML engineering through the history of TFX and Sibyl, highlighting lessons learned, key capabilities, and future directions for maturing ML practices in organizations.
Contribution
It provides a historical perspective on TFX and Sibyl, discusses lessons learned from over a decade of ML platform use, and offers recommendations for advancing ML engineering maturity.
Findings
TFX capabilities support key ML engineering aspects
Organizations benefit from investing in robust ML infrastructure
Adopting interoperable ML platforms accelerates ML maturity
Abstract
Software Engineering, as a discipline, has matured over the past 5+ decades. The modern world heavily depends on it, so the increased maturity of Software Engineering was an eventuality. Practices like testing and reliable technologies help make Software Engineering reliable enough to build industries upon. Meanwhile, Machine Learning (ML) has also grown over the past 2+ decades. ML is used more and more for research, experimentation and production workloads. ML now commonly powers widely-used products integral to our lives. But ML Engineering, as a discipline, has not widely matured as much as its Software Engineering ancestor. Can we take what we have learned and help the nascent field of applied ML evolve into ML Engineering the way Programming evolved into Software Engineering [1]? In this article we will give a whirlwind tour of Sibyl [2] and TensorFlow Extended (TFX) [3], two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Cloud Computing and Resource Management · Software System Performance and Reliability
