Density-optimized Intersection-free Mapping and Matrix Multiplication for Join-Project Operations (extended version)
Zichun Huang, Shimin Chen

TL;DR
This paper introduces DIM3, an optimized algorithm for the Join-Project operation that eliminates redundant steps and improves efficiency through novel partitioning and mapping techniques, significantly outperforming previous solutions.
Contribution
DIM3 presents a novel intersection-free partitioning method and optimized algorithms for mapping and matrix multiplication tailored to Join-Project operations, addressing key limitations of prior work.
Findings
DIM3 outperforms previous solutions by 2.3x to 18x.
It achieves orders of magnitude speedups over RDBMSs.
Experimental results validate the efficiency and scalability of DIM3.
Abstract
A Join-Project operation is a join operation followed by a duplicate eliminating projection operation. It is used in a large variety of applications, including entity matching, set analytics, and graph analytics. Previous work proposes a hybrid design that exploits the classical solution (i.e., join and deduplication), and MM (matrix multiplication) to process the sparse and the dense portions of the input data, respectively. However, we observe three problems in the state-of-the-art solution: 1) The outputs of the sparse and dense portions overlap, requiring an extra deduplication step; 2) Its table-to-matrix transformation makes an over-simplified assumption of the attribute values; and 3) There is a mismatch between the employed MM in BLAS packages and the characteristics of the Join-Project operation. In this paper, we propose DIM3, an optimized algorithm for the Join-Project…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Quality and Management
