PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures
Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan, Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull,, Wahid Bhimji, Prabhat, Pradeep Dubey

TL;DR
This paper introduces a highly optimized parallel kd-tree based KNN algorithm for distributed systems, capable of handling billions of data points efficiently, significantly outperforming previous methods and scaling well on modern hardware.
Contribution
The paper presents novel parallel kd-tree algorithms with improved pruning, load balancing, and partitioning for distributed architectures, enabling scalable KNN computations on massive datasets.
Findings
Constructed a kd-tree of 189 billion particles in 48 seconds using ~50,000 cores.
Computed KNN for 19 billion queries in 12 seconds.
Achieved almost linear speedup on shared and distributed memory systems.
Abstract
Computing -Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
