Streaming SQL Multi-Way Join Method for Long State Streams
Jinlong Hu, Tingfeng Qiu

TL;DR
This paper introduces UMJoin, a streaming SQL multi-way join operator using LSM-Tree to handle long state streams efficiently, and a plan conversion method TSC to facilitate its integration, demonstrating effectiveness on benchmark datasets.
Contribution
It proposes a novel multi-way stream join operator UMJoin with disk-based storage and a plan conversion method TSC for seamless integration into streaming SQL.
Findings
UMJoin effectively processes long state streams with limited memory.
TSC successfully converts execution plans to include UMJoin nodes.
Experimental results confirm the approach's efficiency on benchmark datasets.
Abstract
Streaming computing effectively manages large-scale streaming data in real-time, making it ideal for applications such as real-time recommendations, anomaly detection, and monitoring, all of which require immediate processing. In this context, the multi-way stream join operator is crucial, as it combines multiple data streams into a single operator, providing deeper insights through the integration of information from various sources. However, challenges related to memory limitations can arise when processing long state-based data streams, particularly in the area of streaming SQL. In this paper, we propose a streaming SQL multi-way stream join method that utilizes the LSM-Tree to address this issue. We first introduce a multi-way stream join operator called UMJoin, which employs an LSM-Tree state backend to leverage disk storage, thereby increasing the capacity for storing multi-way…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Distributed and Parallel Computing Systems
