Scalable End-to-End ML Platforms: from AutoML to Self-serve
Igor L. Markov, Pavlos A. Apostolopoulos, Mia R. Garrard, Tanya Qie,, Yin Huang, Tanvi Gupta, Anika Li, Cesar Cardoso, George Han, Ryan, Maghsoudian, Norm Zhou

TL;DR
This paper discusses the development of scalable, self-serve end-to-end ML platforms that leverage automation and integration to enable broad adoption, component reuse, and efficient system maintenance, illustrated through real-world deployments.
Contribution
It introduces the concept of self-serve ML platforms with specific requirements and capabilities, and analyzes tradeoffs and future directions for scalable ML system development.
Findings
Two real-world ML platforms demonstrate broad adoption and scalability.
Automation and integration are key to achieving self-serve capabilities.
Long-term goals and tradeoffs for platform development are identified.
Abstract
ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integration to reach the quality we term self-serve that we define with ten requirements and six optional capabilities. With this in mind, we identify long-term goals for platform development, discuss related tradeoffs and future work. Our reasoning is illustrated on two commercially-deployed end-to-end ML platforms that host hundreds of real-time use cases -- one general-purpose and one specialized.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Parallel Computing and Optimization Techniques · Machine Learning and Data Classification
