UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture

Sitian Chen; Amelie Chi Zhou; Yucheng Shi; Yusen Li; Xin Yao

arXiv:2410.23805·cs.AR·August 21, 2025

UpANNS: Enhancing Billion-Scale ANNS Efficiency with Real-World PIM Architecture

Sitian Chen, Amelie Chi Zhou, Yucheng Shi, Yusen Li, Xin Yao

PDF

Open Access

TL;DR

UpANNS introduces a PIM-based framework that significantly accelerates billion-scale ANNS, outperforming CPU solutions and matching GPU performance with higher energy efficiency, suitable for real-time AI applications.

Contribution

The paper presents UpANNS, a novel PIM architecture-aware framework with four innovations that substantially improve efficiency and scalability of billion-scale ANNS.

Findings

01

4.3x higher QPS than CPU-based Faiss

02

Matches GPU performance with 2.3x greater energy efficiency

03

Near-linear scalability for large datasets

Abstract

Approximate Nearest Neighbor Search (ANNS) is a critical component of modern AI systems, such as recommendation engines and retrieval-augmented large language models (RAG-LLMs). However, scaling ANNS to billion-entry datasets exposes critical inefficiencies: CPU-based solutions are bottlenecked by memory bandwidth limitations, while GPU implementations underutilize hardware resources, leading to suboptimal performance and energy consumption. To address these challenges, we introduce \emph{UpANNS}, a novel framework leveraging Processing-in-Memory (PIM) architecture to accelerate billion-scale ANNS. UpANNS integrates four key innovations, including 1) architecture-aware data placement to minimize latency through workload balancing, 2) dynamic resource management for optimal PIM utilization, 3) co-occurrence optimized encoding to reduce redundant computations, and 4) an early-pruning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence