Graph-Based Algorithms for Diverse Similarity Search
Piyush Anand, Piotr Indyk, Ravishankar Krishnaswamy, Sepideh Mahabadi,, Vikas C. Raykar, Kirankumar Shiragur, Haike Xu

TL;DR
This paper introduces the first graph-based algorithms for approximate nearest neighbor search with diversity constraints, significantly improving efficiency over traditional two-stage methods in low-dimensional datasets.
Contribution
It presents provably efficient, single-stage graph-based algorithms for diverse similarity search, bypassing the need for large initial retrieval sets and post-processing.
Findings
Algorithms achieve sublinear query time depending on $k$ and $\
ext{Experimental results show substantial speedup over traditional methods}
Applicable to low intrinsic dimension datasets with theoretical guarantees.
Abstract
Nearest neighbor search is a fundamental data structure problem with many applications in machine learning, computer vision, recommendation systems and other fields. Although the main objective of the data structure is to quickly report data points that are closest to a given query, it has long been noted (Carbonell and Goldstein, 1998) that without additional constraints the reported answers can be redundant and/or duplicative. This issue is typically addressed in two stages: in the first stage, the algorithm retrieves a (large) number of points closest to the query, while in the second stage, the points are post-processed and a small subset is selected to maximize the desired diversity objective. Although popular, this method suffers from a fundamental efficiency bottleneck, as the set of points retrieved in the first stage often needs to be much larger than the final output.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Management and Algorithms · Data Mining Algorithms and Applications · Web Data Mining and Analysis
