Probabilistic Kernel Function for Fast Angle Testing
Kejing Lu, Chuan Xiao, Yoshiharu Ishikawa

TL;DR
This paper introduces two deterministic, probabilistic kernel functions for angle testing in high-dimensional similarity search, outperforming Gaussian-based methods without requiring asymptotic assumptions.
Contribution
The paper proposes novel projection-based kernel functions for angle comparison and thresholding that are deterministic and outperform traditional Gaussian-based kernels.
Findings
Achieves 2.5x--3x higher QPS compared to HNSW
Does not rely on asymptotic assumptions
Outperforms Gaussian-distribution-based kernels
Abstract
In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, one designed for angle comparison and the other for angle thresholding. Unlike existing approaches that rely on random projection vectors drawn from Gaussian distributions, our approach leverages reference angles and adopts a deterministic structure for the projection vectors. Notably, our kernel functions do not require asymptotic assumptions, such as the number of projection vectors tending to infinity, and can be theoretically and experimentally shown to outperform Gaussian-distribution-based kernel functions. We apply the proposed kernel function to Approximate Nearest Neighbor Search (ANNS) and demonstrate that our approach achieves a 2.5x--3x higher query-per-second (QPS) throughput compared to…
Peer Reviews
Decision·ICLR 2026 Oral
1. Important topic and well motivated-problem 2. Theoretically grounded approach that achieves substantial practical gains. 3. The paper is well-written 4. Source code is provided.
1. Gains over HNSW-PEO are relatively modest (yet non-trivial!) and the method requires extra space, which is non-trivial in some cases. For example, it is >= 40% in the case of the SIFT dataset. 2. Evaluation is only single-threaded. **Detailed comments:** **Please, do not respond to these, all questions are rhetorical. If suggested correction is not valid, just ignore it** Eq. (2) Shouldn’t Z_{HS} be Z_S? L341 This is not understandable without a basic explanation of what a routing
1. This paper studies an important problem - high-dimensional similarity search. 2. Detail theoretical analysis is presented to show the effectiveness and correctness of the proposed probabilistic kernel functions. 3. Experimental results are presented to show the empirical effectiveness of the proposed probabilistic kernel functions and algorithms. 4. Source code has been released.
The empirical performance of the proposed kernel functions is not as strong. It produces marginally higher recall than the CEOs technique (Pham, 2021) as shown in Table 1, while HNSW+KS2 is only 1.1 to 1.3 times faster than HNSW+PEOs. Also, why are Tiny, GIST, and SIFT omitted from Table 1? It would be good to discuss if the proposed kernel functions are guaranteed to lead to more accurate (and/or more efficient) similarity search than Gaussian-distribution-based kernel functions given the same
The paper proposes an interesting approach, angle testing, which has a central role in approximate nearest neighbor search. The authors propose methods for both angle comparison and thresholding. The provided theory is neat and well-motivated. The proposed KS2 test can be generally applied to many different graph-based approximate nearest neighbor methods, and is amenable to an efficient SIMD implementation that yields a significant improvement in throughput when combined with the popular HNSW
In practice, the improvement provided by KS1 over CEOs is very minor. The improvement of HNSW+KS2 over the earlier HNSW+PEOs is slightly larger but still relatively small. The state-of-the-art methods for approximate nearest neighbor search combine graphs and quantization, e.g. Glass combines graphs with scalar quantization and SymphonyQG [1] combines graphs with RaBitQ, yet these are not included in the comparisons. The authors mention that they do not compare to e.g. Glass as it was deemed le
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Image and Object Detection Techniques
