Towards Efficient Random-Order Enumeration for Join Queries
Pengyu Chen, Zizheng Guo, Jianwei Yang, Dongjing Miao

TL;DR
This paper introduces an efficient, near-optimal algorithm for uniformly random enumeration of join query results, significantly improving over existing methods in speed and flexibility, with practical experimental validation.
Contribution
We develop the first efficient random-order enumeration algorithm for join queries with worst-case guarantees, no query-specific preprocessing, and adaptable to common database indexes.
Findings
Expected delay of $O(\frac{\mathrm{AGM}(Q)}{|Res(Q)|}\log^2|Q|)$
Total runtime of $O(\mathrm{AGM}(Q)\log|Q|)$ after $O(|Q|\log|Q|)$ preprocessing
Significant performance improvements over existing methods
Abstract
In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful representation of the whole join result, the result tuples are required to be enumerated in uniformly random order. However, existing studies lack an efficient random-order enumeration algorithm with a worst-case runtime guarantee for (cyclic) join queries. In this paper, we study the problem of enumerating the results of a join query in random order. We develop an efficient random-order enumeration algorithm for join queries with no large hidden constants in its complexity, achieving expected delay, total running time after -time index construction, where is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Quality and Management
