Learning Models over Relational Data: A Brief Tutorial
Maximilian Schleich, Dan Olteanu, Mahmoud Abo-Khamis, Hung Q. Ngo,, XuanLong Nguyen

TL;DR
This tutorial discusses efficient methods for learning models over relational data by leveraging database techniques to avoid costly data materialization, focusing on query-based computation and structural properties.
Contribution
It advocates for a first-principles, database-centric approach to learning over relational data, highlighting recent techniques to improve performance without full data materialization.
Findings
Transforming learning tasks into database queries reduces costs.
Structural properties like hypertree width enable optimized query execution.
Techniques such as factorized computation and parallelization enhance efficiency.
Abstract
This tutorial overviews the state of the art in learning models over relational databases and makes the case for a first-principles approach that exploits recent developments in database research. The input to learning classification and regression models is a training dataset defined by feature extraction queries over relational databases. The mainstream approach to learning over relational data is to materialize the training dataset, export it out of the database, and then learn over it using a statistical package. This approach can be expensive as it requires the materialization of the training dataset. An alternative approach is to cast the machine learning problem as a database problem by transforming the data-intensive component of the learning task into a batch of aggregates over the feature extraction query and by computing this batch directly over the input database. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
