GPU Accelerated Similarity Self-Join for Multi-Dimensional Data
Michael Gowanlock, Ben Karsin

TL;DR
This paper presents a GPU-accelerated algorithm for performing similarity self-joins on high-dimensional data, utilizing a grid-based index and various optimizations to improve performance and scalability on modern heterogeneous systems.
Contribution
It introduces a novel GPU-optimized self-join algorithm with specific indexing and pruning techniques, and demonstrates scalable performance on multi-GPU and distributed systems.
Findings
Outperforms state-of-the-art GPU self-join methods on real-world datasets
Effective data reordering enhances filtering power and reduces computations
Achieves scalable performance on multi-GPU and distributed-memory architectures
Abstract
The self-join finds all objects in a dataset that are within a search distance, epsilon, of each other; therefore, the self-join is a building block of many algorithms. We advance a GPU-accelerated self-join algorithm targeted towards high dimensional data. The massive parallelism afforded by the GPU and high aggregate memory bandwidth makes the architecture well-suited for data-intensive workloads. We leverage a grid-based, GPU-tailored index to perform range queries. We propose the following optimizations: (i) a trade-off between candidate set filtering and index search overhead by exploiting properties of the index; (ii) reordering the data based on variance in each dimension to improve the filtering power of the index; and (iii) a pruning method for reducing the number of expensive distance calculations. Across most scenarios on real-world and synthetic datasets, our algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Image and Video Retrieval Techniques · Caching and Content Delivery
