Semantic Recall for Vector Search
Leonardo Kuffo, Ioanna Tsakalidou, Roberta De Viti, Albert Angel, Ji\v{r}\'i I\v{s}a, Rastislav Lenhardt

TL;DR
Semantic Recall is a new metric for evaluating approximate nearest neighbor search algorithms focusing on retrieving semantically relevant objects, improving assessment especially for queries with few relevant results.
Contribution
The paper introduces Semantic Recall and Tolerant Recall metrics, providing a better evaluation framework for retrieval quality in embedding datasets.
Findings
Semantic Recall better assesses retrieval quality for semantically relevant objects.
Optimizing for these metrics improves cost-quality tradeoffs.
Tolerant Recall approximates Semantic Recall when relevant objects are unknown.
Abstract
We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
