Graph-based Nearest Neighbors with Dynamic Updates via Random Walks
Nina Mishra, Yonatan Naamad, Tal Wagner, Lichen Zhang

TL;DR
This paper introduces a new graph-based ANN framework using random walks, enabling efficient data deletion with minimal impact on search quality, addressing limitations of existing methods.
Contribution
It proposes a novel theoretical framework for graph-based ANN with randomized and deterministic deletion algorithms that improve tradeoffs in query latency, recall, and deletion time.
Findings
Deterministic deletion algorithm preserves hitting time statistics.
Proposed methods outperform existing approaches in experiments.
Enhanced balance between query speed, accuracy, and deletion efficiency.
Abstract
Approximate nearest neighbor search (ANN) is a common way to retrieve relevant search results, especially now in the context of large language models and retrieval augmented generation. One of the most widely used algorithms for ANN is based on constructing a multi-layer graph over the dataset, called the Hierarchical Navigable Small World (HNSW). While this algorithm supports insertion of new data, it does not support deletion of existing data. Moreover, deletion algorithms described by prior work come at the cost of increased query latency, decreased recall, or prolonged deletion time. In this paper, we propose a new theoretical framework for graph-based ANN based on random walks. We then utilize this framework to analyze a randomized deletion approach that preserves hitting time statistics compared to the graph before deleting the point. We then turn this theoretical framework into a…
Peer Reviews
Decision·ICLR 2026 Poster
### Paper Strengths S1. Efficient, high-recall dynamic deletion is one of the most significant unsolved challenges for graph-based ANN indexes in production. The paper’s motivation is strong and highly relevant. S2. The idea of modeling a graph’s greedy search as a “softmax walk” is clever, bridging a heuristic algorithm with formal random walk theory. S3. The paper includes comparisons against mainstream baselines and demonstrates competitive performance across recall, query speed, deletion
### Paper Weaknesses W1. The paper’s theoretical foundation is built on undirected graphs, whereas mainstream implementations use directed graphs, leaving a gap that the paper does not justify. (See D1, D2) W2. The algorithm does not clearly describe how it handles in-neighbors of a deleted node. (See D3) W3. The algorithm is only validated on HNSW, and its robustness on other graph indexes (e.g., NSG, DiskANN) is unknown. (See D4) W4. The algorithm section includes excessive theoretical for
I see two main strengths in this paper: The first is the novel perspective of interpreting the greedy walk through the lens of a softmax walk. This conceptual shift provides valuable intuition and a unifying view of the search process. Building on this softmax interpretation, the paper proposes a graph sparsification approach, where handling deletions efficiently naturally corresponds to sparsifying a complete weighted graph. These viewpoints together lead to a practical and effective deletio
1) I feel there is a little disconnect between theory and its relation to guarantees achieved by the ANN procedure. For instance, I would have loved to see a theorem statement of the kind, for any query q, if greedy walk on G achieves alpha approx or recall, then softmax walk also achieves same approx or recall with high prob. Also, greedy walk or softmax walk on sparsified G' also achieves similar approximation factors. 2) Most of the theorem statements are simple. I am not so impressed with
S1: Deletion is important operation for dynamic similarity graphs, and any related research is welcome. S2: The theoretical analysis of deletion in similarity graphs seems reasonable.
W1. A highly related work, DEG [a], is missing. In [a], the authors also proposed a similarity graph capable of handling all updates, including deletions. It is recommended that the proposed approach, Spatch, be compared with [a] thoroughly in terms of both design and performance. W2. The authors use rebuild as the baseline. However, a recently proposed approach [b] can rebuild the HNSW index much faster without impairing search performance. The authors are recommended to implement rebuild acco
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Advanced Image and Video Retrieval Techniques · Web Data Mining and Analysis
