Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach
Jiashu Wu, Yang Wang, Xiaopeng Fan, Kejiang Ye, Chengzhong Xu

TL;DR
This paper introduces Prefap, a novel fast theta-join algorithm that combines prefiltering and amalgamated partitioning techniques to significantly improve efficiency and reduce Cartesian products in distributed data stream processing.
Contribution
It presents a new framework that enhances the existing FastThetaJoin algorithm with prefiltering and amalgamated partitioning, achieving better performance in distributed theta-join operations.
Findings
Prefap reduces the number of Cartesian products compared to existing algorithms.
Prefap demonstrates higher efficiency in both synthetic and real data stream tests.
The approach improves join quality through fine-grained partitioning.
Abstract
As one of the most useful online processing techniques, the theta-join operation has been utilized by many applications to fully excavate the relationships between data streams in various scenarios. As such, constant research efforts have been put to optimize its performance in the distributed environment, which is typically characterized by reducing the number of Cartesian products as much as possible. In this article, we design and implement a novel fast theta-join algorithm, called Prefap, by developing two distinct techniques - prefiltering and amalgamated partitioning-based on the state-of-the-art FastThetaJoin algorithm to optimize the efficiency of the theta-join operation. Firstly, we develop a prefiltering strategy before data streams are partitioned to reduce the amount of data to be involved and benefit a more fine-grained partitioning. Secondly, to avoid the data streams…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
