Blend: A Unified Data Discovery System

Mahdi Esmailoghli; Christoph Schnell; Ren\'ee J. Miller; Ziawasch; Abedjan

arXiv:2310.02656·cs.DB·December 2, 2024·1 cites

Blend: A Unified Data Discovery System

Mahdi Esmailoghli, Christoph Schnell, Ren\'ee J. Miller, Ziawasch, Abedjan

PDF

Open Access 1 Repo

TL;DR

BLEND is a comprehensive data discovery system that integrates multiple operators with a unified index and optimizer, enabling flexible and efficient discovery pipelines for complex data tasks.

Contribution

It introduces a unified system supporting multiple discovery operators with a novel index and optimizer, enhancing flexibility and efficiency over existing ad-hoc solutions.

Findings

01

Outperforms stand-alone discovery solutions in flexibility and speed.

02

Supports complex discovery pipelines with a unified approach.

03

Reduces execution time through a rule-based optimizer.

Abstract

Most research on data discovery has so far focused on improving individual discovery operators such as join, correlation, or union discovery. However, in practice, a combination of these techniques and their corresponding indexes may be necessary to support arbitrary discovery tasks. We propose BLEND, a comprehensive data discovery system that supports existing operators and enables their flexible pipelining. BLEND is based on a set of lower-level operators that serve as fundamental building blocks for more complex and sophisticated user tasks. To reduce the execution runtime of discovery pipelines, we propose a unified index structure and a rule-based optimizer that rewrites SQL statements into low-level operators when possible. We show the superior flexibility and efficiency of our system compared to ad-hoc discovery pipelines and stand-alone solutions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

luh-dbs/blend
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Data Quality and Management · Advanced Database Systems and Queries