Costing Generated Runtime Execution Plans for Large-Scale Machine   Learning Programs

Matthias Boehm

arXiv:1503.06384·cs.DC·March 24, 2015·2 cites

Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs

Matthias Boehm

PDF

Open Access

TL;DR

This paper presents a simple, accurate cost model for evaluating and optimizing large-scale ML program execution plans, integrating various cost factors into a unified measure to improve runtime efficiency.

Contribution

It introduces a robust cost model that accurately estimates execution times of ML plans, capturing control flow and multiple cost factors for better optimization.

Findings

01

Cost model effectively reflects all optimization phases.

02

Incorporates control flow, IO, latency, and computation costs.

03

Enhances optimizer performance in SystemML.

Abstract

Declarative large-scale machine learning (ML) aims at the specification of ML algorithms in a high-level language and automatic generation of hybrid runtime execution plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks like Spark. The compilation of large-scale ML programs exhibits many opportunities for automatic optimization. Advanced cost-based optimization techniques require---as a fundamental precondition---an accurate cost model for evaluating the impact of optimization decisions. In this paper, we share insights into a simple and robust yet accurate technique for costing alternative runtime execution plans of ML programs. Our cost model relies on generating and costing runtime plans in order to automatically reflect all successive optimization phases. Costing runtime plans also captures control flow structures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Machine Learning in Materials Science