Towards Efficient Random-Order Enumeration for Join Queries

Pengyu Chen; Zizheng Guo; Jianwei Yang; Dongjing Miao

arXiv:2507.00489·cs.DB·July 2, 2025

Towards Efficient Random-Order Enumeration for Join Queries

Pengyu Chen, Zizheng Guo, Jianwei Yang, Dongjing Miao

PDF

Open Access

TL;DR

This paper introduces an efficient, near-optimal algorithm for uniformly random enumeration of join query results, significantly improving over existing methods in speed and flexibility, with practical experimental validation.

Contribution

We develop the first efficient random-order enumeration algorithm for join queries with worst-case guarantees, no query-specific preprocessing, and adaptable to common database indexes.

Findings

01

Expected delay of $O(\frac{\mathrm{AGM}(Q)}{|Res(Q)|}\log^2|Q|)$

02

Total runtime of $O(\mathrm{AGM}(Q)\log|Q|)$ after $O(|Q|\log|Q|)$ preprocessing

03

Significant performance improvements over existing methods

Abstract

In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful representation of the whole join result, the result tuples are required to be enumerated in uniformly random order. However, existing studies lack an efficient random-order enumeration algorithm with a worst-case runtime guarantee for (cyclic) join queries. In this paper, we study the problem of enumerating the results of a join query in random order. We develop an efficient random-order enumeration algorithm for join queries with no large hidden constants in its complexity, achieving expected $O (\frac{AGM ( Q )}{∣ R es ( Q ) ∣} lo g^{2} ∣ Q ∣)$ delay, $O (AGM (Q) lo g ∣ Q ∣)$ total running time after $O (∣ Q ∣ lo g ∣ Q ∣)$ -time index construction, where $∣ Q ∣$ is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Quality and Management