Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search
Alexandr Andoni, Themistoklis Haris, Esty Kelman, Krzysztof Onak

TL;DR
This paper develops new algorithms for approximate nearest neighbor search under adversarial conditions, combining fairness, differential privacy, and innovative data structures to improve robustness and performance in high and low dimensions.
Contribution
It introduces a novel connection between adaptive security and fairness, and proposes concentric-annuli LSH to surpass query time barriers, advancing adversarially robust ANN algorithms.
Findings
New algorithms for high-dimensional adversarial ANN with improved guarantees
A concentric-annuli LSH construction that breaks the $ extsqrt{n}$ query time barrier
Enhanced fair ANN results and simplified metric covering constructions in low dimensions
Abstract
We study the Approximate Nearest Neighbor (ANN) problem under a powerful adaptive adversary that controls both the dataset and a sequence of queries. Primarily, for the high-dimensional regime of , we introduce a sequence of algorithms with progressively stronger guarantees. We first establish a novel connection between adaptive security and \textit{fairness}, leveraging fair ANN search to hide internal randomness from the adversary with information-theoretic guarantees. To achieve data-independent performance, we then reduce the search problem to a robust decision primitive, solved using a differentially private mechanism on a Locality-Sensitive Hashing (LSH) data structure. This approach, however, faces an inherent query time barrier. To break the barrier, we propose a novel concentric-annuli LSH construction that synthesizes these fairness and…
Peer Reviews
Decision·Submitted to ICLR 2026
- I find it appealing that Theorem 1.2 and 1.3 achieve runtime and space bounds independent of dataset-specific quantities (i.e. $s$ in [Feng'25] or $D$ in Theorem 1.1). This makes performance predictable on worst-case datasets and avoids hidden inefficiency due to dense neighborhoods. The search-to-decision reduction that enables this isolates the leakage channel and patches it with a DP mechanism, which feels natural and standard, but is executed nicely. - (Subject to correctness,) the fairne
I only checked the first proof (fairness implies robustness) and I'm confused about the following point: In Definition 2.1, both $R$ and $R_{setup}$ are used to denote the randomness used to *initialize* the data structure. So I assume that $R = R_{setup}$ and write $W_i := (R_{setup}, R_1, \cdots , R_i)$. My question concerns Definition 3.1: is the ith answer $a_i$ independent of $(a_1, \cdots, a_{i-1})$ and $R$, or is $a_i$ independent of $(a_1, \cdots, a_{i-1})$ and $W_{i-1}$? Definition 3.1
1. The observation that fairness implies robustness is conceptually elegant and powerful. Its applicability extends beyond the ANN problem and may motivate further exploration of fairness-based defenses in other algorithmic settings. 2. Theorem 3, in particular, presents a strong result for adversarially robust ANN, improving upon prior work and breaking the $\sqrt{n}$ barrier under mild assumptions. 3. In addition to high-dimensional settings, the authors also provide results for low-dimensio
1. Both Theorems 2 and 3 include a $\sqrt{Q}$ factor, which can be significant when the number of adaptive queries is large. While Theorem 1 avoids this factor, it introduces dependence on data density, and it remains unclear whether the $\sqrt{Q}$ dependence can be eliminated in the general case. 2. The adversary is assumed to fix the dataset in advance but may adaptively choose queries throughout execution. The paper does not provide sufficient justification for this threat model, and it is n
1. This work establishes and proves (Claim 3.3) the core theoretical result that exact fair ANN algorithms are adversarially robust. 2. The paper presents a thoughtful progression from fairness-induced robustness (Theorem 1.1) to assumption-free methods via bucketing (Theorem 1.2), culminating in the concentric annuli construction (Theorem 1.3) that achieves sublinear query time even in worst-case datasets. 3. Table 1 provides a summary of algorithmic tradeoffs (query time, space) under varying
1. This work is entirely theoretical. It lacks experiments to evidence the theoretical results. I would suggest authors to include some experiments to support the claims. 2. Claims are ambiguous. For instance, Theorem 1.1 and Theorem 1.2 offload much of the complexity into density ratios, while we are still not sure about how these density ratios affects in the real application scenarios. 3. In the for-all algorithms (sec 1.1.2), how severe is the intractability when $d$ is only modestly large (
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Advanced Image and Video Retrieval Techniques
