GARLIC: GAussian Representation LearnIng for spaCe partitioning
Panagiotis Rigas, Panagiotis Drivas, Charalambos Tzamos, Ioannis Chamodrakas, George Ioannakis, Leonidas J. Guibas, Ioannis Z. Emiris

TL;DR
GARLIC introduces an adaptive, geometry-aware Gaussian partitioning method for high-dimensional Euclidean nearest neighbor search, improving efficiency and robustness over traditional isotropic cell-based approaches.
Contribution
It proposes a novel anisotropic Gaussian partitioning approach that adapts to local data geometry and density, enhancing ANN search performance.
Findings
Reduces candidate counts and cross-cell neighbor splits.
Maintains robustness with limited training data.
Offers competitive recall-efficiency trade-offs.
Abstract
We present \textbf{GARLIC}, a representation learning approach for Euclidean approximate nearest neighbor (ANN) search in high dimensions. Existing partitions tend to rely on isotropic cells, fixed global resolution, or balanced constraints, which fragment dense regions and merge unrelated points in sparse ones, thereby increasing the candidate count when probing only a few cells. Our method instead partitions \(\mathbb{R}^d\) into anisotropic Gaussian cells whose shapes align with local geometry and sizes adapt to data density. Information-theoretic objectives balance coverage, overlap, and geometric alignment, while split/clone refinement introduces Gaussians only where needed. At query time, Mahalanobis distance selects relevant cells and localized quantization prunes candidates. This yields partitions that reduce cross-cell neighbor splits and candidate counts under small probe…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
The authors aim to improve the partitioning itself in partition-based ANN methods which has recently been an understudied subject due to the focus shifting to graph-based ANN methods, but is still an important research area. The authors propose a several promising tweaks to make the method work, and the effects of these tweaks are studied in ablation studies.
The method proposed by the authors is a rather convoluted method for which the authors' own experiments do not provide indication that the proposed method is actually useful in practice. Comparisons are performed on three datasets, two of which (MNIST and Fashion-MNIST) contain only 60K points. It is very easy to achieve Recall@1 > 0.9 on these datasets, and in most practical scenarios this is the only regime of interest, yet the proposed method actually performs the same or worse than the simpl
I believe the main motivation of the proposed indexing scheme (the need of anisotropic cells) is valuable and can potentially lead to new approaches in the future research.
(1) All the experiments are performed on outdated benchmarks. Real-world applications do not use SIFT/flattenedMNIST in 2025, therefore the results are not informative for the practitioners, at least Deep1M should be used. (2) The authors do not report the comparison to the alternatives in terms of wall-clock time, only in terms of accuracy-vs-number_of_retrieved_candidates. But the complexity of retrieving the candidates can be very different for various methods. For instance, I assume that co
- The paper is clearly motivated, addressing the issue of isotropic partitions in traditional ANN methods. - The framework combines Gaussian parameterization, information-theoretic objectives, and adaptive refinement (split/clone), which are conceptually interesting. - Experimental comparisons include a variety of ANN baselines (k-Means, LSH, PCA Tree, Faiss-IVF, IVFPQFS), showing reasonable empirical coverage.
**W1. Unclear interaction between loss components and retrieval performance** The total loss L = L_div + L_cov + L_anchor combines three objectives, yet the paper does not provide an analytical or empirical explanation of how these terms interact with retrieval quality. From Table 1, L_div or L_cov can improve coverage, but it is unclear whether improving these terms jointly leads to better query performance or conflicts with L_anchor, as the mutual influence among the three losses remains unde
- [S1] The proposed partitioning scheme seems to adapt well to the underlying data distribution - [S2] The explanation why the proposed partitioning scheme adapts to the underlying data distribution is intuitive.
- [W1] The authors claim that they follow the standard setup of the ANN-Benchmarks (Aumüller et al., 2020). However, ANN-benchmarks use queries-per-second (as measure by wall clock time) to measure the efficiency of the algorithms, whereas the authors use the number of distance computations as a proxy for efficiency. The authors justify this by claiming (lines 330-332) that the dominant inference cost comes from the distance computations of the reranking phase, and refer to the complexity analys
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Characterization · Bauxite Residue and Utilization · Zeolite Catalysis and Synthesis
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
