Blend: A Unified Data Discovery System
Mahdi Esmailoghli, Christoph Schnell, Ren\'ee J. Miller, Ziawasch, Abedjan

TL;DR
BLEND is a comprehensive data discovery system that integrates multiple operators with a unified index and optimizer, enabling flexible and efficient discovery pipelines for complex data tasks.
Contribution
It introduces a unified system supporting multiple discovery operators with a novel index and optimizer, enhancing flexibility and efficiency over existing ad-hoc solutions.
Findings
Outperforms stand-alone discovery solutions in flexibility and speed.
Supports complex discovery pipelines with a unified approach.
Reduces execution time through a rule-based optimizer.
Abstract
Most research on data discovery has so far focused on improving individual discovery operators such as join, correlation, or union discovery. However, in practice, a combination of these techniques and their corresponding indexes may be necessary to support arbitrary discovery tasks. We propose BLEND, a comprehensive data discovery system that supports existing operators and enables their flexible pipelining. BLEND is based on a set of lower-level operators that serve as fundamental building blocks for more complex and sophisticated user tasks. To reduce the execution runtime of discovery pipelines, we propose a unified index structure and a rule-based optimizer that rewrites SQL statements into low-level operators when possible. We show the superior flexibility and efficiency of our system compared to ad-hoc discovery pipelines and stand-alone solutions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Data Quality and Management · Advanced Database Systems and Queries
