1-D and 2-D Parallel Algorithms for All-Pairs Similarity Problem
Eray \"Ozkural, Cevdet Aykanat

TL;DR
This paper explores parallel algorithms for the all-pairs similarity problem, proposing 1-D and 2-D data distribution strategies that optimize performance for data mining tasks.
Contribution
It introduces novel 1-D and 2-D parallel algorithms with effective data distribution and pruning techniques, implemented in OCaml, enhancing scalability and efficiency.
Findings
Performance varies with dataset characteristics
1-D vertical distribution reduces candidate count
2-D distribution offers flexible parallelization
Abstract
All-pairs similarity problem asks to find all vector pairs in a set of vectors the similarities of which surpass a given similarity threshold, and it is a computational kernel in data mining and information retrieval for several tasks. We investigate the parallelization of a recent fast sequential algorithm. We propose effective 1-D and 2-D data distribution strategies that preserve the essential optimizations in the fast algorithm. 1-D parallel algorithms distribute either dimensions or vectors, whereas the 2-D parallel algorithm distributes data both ways. Additional contributions to the 1-D vertical distribution include a local pruning strategy to reduce the number of candidates, a recursive pruning algorithm, and block processing to reduce imbalance. The parallel algorithms were programmed in OCaml which affords much convenience. Our experiments indicate that the performance depends…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Algorithms and Data Compression
