Scaling Datalog for Machine Learning on Big Data
Yingyi Bu, Vinayak Borkar, Michael J. Carey, Joshua Rosen, Neoklis, Polyzotis, Tyson Condie, Markus Weimer, Raghu Ramakrishnan

TL;DR
This paper advocates for using Datalog, a declarative query language, to unify and optimize machine learning workflows on big data, demonstrating promising performance and flexibility.
Contribution
It introduces a declarative approach using Datalog to program, optimize, and execute diverse machine learning models on large-scale data systems.
Findings
Declarative Datalog-based models can effectively represent ML algorithms.
Optimized execution plans improve performance on large clusters.
The approach offers increased flexibility and ease of programming.
Abstract
In this paper, we present the case for a declarative foundation for data-intensive machine learning systems. Instead of creating a new system for each specific flavor of machine learning task, or hardcoding new optimizations, we argue for the use of recursive queries to program a variety of machine learning systems. By taking this approach, database query optimization techniques can be utilized to identify effective execution plans, and the resulting runtime plans can be executed on a single unified data-parallel query processing engine. As a proof of concept, we consider two programming models--Pregel and Iterative Map-Reduce-Update---from the machine learning domain, and show how they can be captured in Datalog, tuned for a specific task, and then compiled into an optimized physical plan. Experiments performed on a large computing cluster with real data demonstrate that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Cloud Computing and Resource Management · Data Management and Algorithms
