A Unified System for Data Analytics and In Situ Query Processing
Alex Watson, Suvam Kumar Das, Suprio Ray

TL;DR
DaskDB is a unified system built on Python's Dask that enables efficient in situ SQL querying and data analysis over heterogeneous sources, reducing data transfer costs and integrating seamlessly with existing Python workflows.
Contribution
The paper introduces DaskDB, a novel system combining data analytics and in situ SQL processing, with a distributed learned index to optimize join operations.
Findings
DaskDB significantly outperforms existing systems in experiments.
Supports invoking Python APIs as UDFs within SQL queries.
Reduces data transfer costs by unifying analysis and querying in one system.
Abstract
In today's world data is being generated at a high rate due to which it has become inevitable to analyze and quickly get results from this data. Most of the relational databases primarily support SQL querying with a limited support for complex data analysis. Due to this reason, data scientists have no other option, but to use a different system for complex data analysis. Due to this, data science frameworks are in huge demand. But to use such a framework, all the data needs to be loaded into it. This requires significant data movement across multiple systems, which can be expensive. We believe that it has become the need of the hour to come up with a single system which can perform both data analysis tasks and SQL querying. This will save the data scientists from the expensive data transfer operation across systems. In our work, we present DaskDB, a system built over the Python's Dask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Scientific Computing and Data Management
