UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture
Sitian Chen, Amelie Chi Zhou, Yucheng Shi, Yusen Li, Xin Yao

TL;DR
UpANNS introduces a PIM-based framework that significantly accelerates billion-scale ANNS, outperforming CPU solutions and matching GPU performance with higher energy efficiency, suitable for real-time AI applications.
Contribution
The paper presents UpANNS, a novel PIM architecture-aware framework with four innovations that substantially improve efficiency and scalability of billion-scale ANNS.
Findings
4.3x higher QPS than CPU-based Faiss
Matches GPU performance with 2.3x greater energy efficiency
Near-linear scalability for large datasets
Abstract
Approximate Nearest Neighbor Search (ANNS) is a critical component of modern AI systems, such as recommendation engines and retrieval-augmented large language models (RAG-LLMs). However, scaling ANNS to billion-entry datasets exposes critical inefficiencies: CPU-based solutions are bottlenecked by memory bandwidth limitations, while GPU implementations underutilize hardware resources, leading to suboptimal performance and energy consumption. To address these challenges, we introduce \emph{UpANNS}, a novel framework leveraging Processing-in-Memory (PIM) architecture to accelerate billion-scale ANNS. UpANNS integrates four key innovations, including 1) architecture-aware data placement to minimize latency through workload balancing, 2) dynamic resource management for optimal PIM utilization, 3) co-occurrence optimized encoding to reduce redundant computations, and 4) an early-pruning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence
