Efficient Candidate-Free R-S Set Similarity Joins with Filter-and-Verification Trees on MapReduce
Yuhong Feng, Fangcao Jian, Yixuan Cao, Xiaobin Jian, Jia Wang, Haiyue Feng, Chunyan Miao

TL;DR
This paper introduces candidate-free R-S set similarity join algorithms using filter-and-verification trees, significantly reducing overhead and improving performance on large datasets with MapReduce.
Contribution
It proposes novel candidate-free algorithms with filter-and-verification trees that unify filtering and verification, enhancing efficiency over existing methods.
Findings
MR-CF-RS-Join/LFVT outperforms baselines by up to 15.78x
Algorithms reduce I/O and verification overhead
Effective on large real-world datasets
Abstract
Given two different collections of sets R and S, the exact R-S set similarity join (R-S Join) finds all set pairs with similarity no less than a given threshold, which has widespread applications. Existing algorithms accelerate large-scale R-S Joins using a two-stage filter-and-verification framework along with the parallel and distributed MapReduce framework, however, they suffer from excessive candidate set pairs (candidates), leading to significant I/O and verification overhead. This paper proposes novel candidate-free R-S Join (CF-RS-Join) algorithms that integrate filtering and verification into a single stage through the filter-and-verification tree (FVT) and its linear variant (LFVT). First, CF-RS-Join with FVT (CF-RS-Join/FVT) is proposed to leverage an innovative FVT structure that compresses elements and associated sets in memory, enabling single-stage processing that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuzzy Logic and Control Systems · Neural Networks and Applications · Face and Expression Recognition
