Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs
Matthias Boehm

TL;DR
This paper presents a simple, accurate cost model for evaluating and optimizing large-scale ML program execution plans, integrating various cost factors into a unified measure to improve runtime efficiency.
Contribution
It introduces a robust cost model that accurately estimates execution times of ML plans, capturing control flow and multiple cost factors for better optimization.
Findings
Cost model effectively reflects all optimization phases.
Incorporates control flow, IO, latency, and computation costs.
Enhances optimizer performance in SystemML.
Abstract
Declarative large-scale machine learning (ML) aims at the specification of ML algorithms in a high-level language and automatic generation of hybrid runtime execution plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks like Spark. The compilation of large-scale ML programs exhibits many opportunities for automatic optimization. Advanced cost-based optimization techniques require---as a fundamental precondition---an accurate cost model for evaluating the impact of optimization decisions. In this paper, we share insights into a simple and robust yet accurate technique for costing alternative runtime execution plans of ML programs. Our cost model relies on generating and costing runtime plans in order to automatically reflect all successive optimization phases. Costing runtime plans also captures control flow structures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Machine Learning in Materials Science
