Model Joins: Enabling Analytics Over Joins of Absent Big Tables
Ali Mohammadi Shanghooshabad, Peter Triantafillou

TL;DR
This paper introduces Model Join, a framework that creates models of absent tables to enable analytics on their joins without accessing raw data, facilitating privacy-preserving and cost-effective knowledge discovery.
Contribution
It proposes a novel framework for joining models of absent tables to approximate raw data joins, supporting various downstream analytics tasks.
Findings
Effective approximation of raw data joins using model joins
Supports multiple LKD tasks with high-quality samples
Demonstrated usefulness on TPC-DS and synthetic data
Abstract
This work is motivated by two key facts. First, it is highly desirable to be able to learn and perform knowledge discovery and analytics (LKD) tasks without the need to access raw-data tables. This may be due to organizations finding it increasingly frustrating and costly to manage and maintain ever-growing tables, or for privacy reasons. Hence, compact models can be developed from the raw data and used instead of the tables. Second, oftentimes, LKD tasks are to be performed on a (potentially very large) table which is itself the result of joining separate (potentially very large) relational tables. But how can one do this, when the individual to-be-joined tables are absent? Here, we pose the following fundamental questions: Q1: How can one "join models" of (absent/deleted) tables or "join models with other tables" in a way that enables LKD as if it were performed on the join of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Management and Algorithms
