Deep Learning with Apache SystemML
Niketan Pansare, Michael Dusenberry, Nakul Jindal, Matthias Boehm,, Berthold Reinwald, Prithviraj Sen

TL;DR
This paper presents Apache SystemML, a unified framework that automatically generates optimized execution plans for machine learning and deep learning tasks on large, shared data environments, adapting to data and cluster characteristics.
Contribution
It introduces a novel compilation approach that creates adaptive runtime plans for ML/DL algorithms in shared big data environments, integrating with Hadoop and Spark.
Findings
Automatic generation of efficient execution plans for ML/DL algorithms.
Adaptive runtime plans based on data size, sparsity, and cluster configuration.
Integration with existing big data frameworks like Hadoop and Spark.
Abstract
Enterprises operate large data lakes using Hadoop and Spark frameworks that (1) run a plethora of tools to automate powerful data preparation/transformation pipelines, (2) run on shared, large clusters to (3) perform many different analytics tasks ranging from model preparation, building, evaluation, and tuning for both machine learning and deep learning. Developing machine/deep learning models on data in such shared environments is challenging. Apache SystemML provides a unified framework for implementing machine learning and deep learning algorithms in a variety of shared deployment scenarios. SystemML's novel compilation approach automatically generates runtime execution plans for machine/deep learning algorithms that are composed of single-node and distributed runtime operations depending on data and cluster characteristics such as data size, data sparsity, cluster size, and memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Data Processing Techniques · Software System Performance and Reliability
