Towards Linear Algebra over Normalized Data
Lingjiao Chen, Arun Kumar, Jeffrey Naughton, Jignesh M. Patel

TL;DR
This paper introduces a framework that uses algebraic rewrite rules to perform linear algebra directly over normalized relational data, enabling efficient, automatic factorization of ML algorithms without manual rewriting.
Contribution
It presents a novel algebraic approach to perform linear algebra over normalized data, unifying and generalizing prior factorization methods for ML algorithms.
Findings
Achieves up to 36x speed-up on real data
Enables automatic factorization of multiple ML algorithms
Unifies prior work through algebraic rewriting
Abstract
Providing machine learning (ML) over relational data is a mainstream requirement for data analytics systems. While almost all the ML tools require the input data to be presented as a single table, many datasets are multi-table, which forces data scientists to join those tables first, leading to data redundancy and runtime waste. Recent works on "factorized" ML mitigate this issue for a few specific ML algorithms by pushing ML through joins. But their approaches require a manual rewrite of ML implementations. Such piecemeal methods create a massive development overhead when extending such ideas to other ML algorithms. In this paper, we show that it is possible to mitigate this overhead by leveraging a popular formal algebra to represent the computations of many ML algorithms: linear algebra. We introduce a new logical data type to represent normalized data and devise a framework of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Mining Algorithms and Applications · Data Quality and Management
