Rethinking Similarity Search: Embracing Smarter Mechanisms over Smarter Data
Renzhi Wu, Jingfan Meng, Jie Jeff Xu, Huayi Wang, Kexin Rong

TL;DR
This paper advocates for rethinking similarity search by improving search mechanisms through exploiting data structures, user feedback, and multi-query approaches, especially in large-scale AI applications.
Contribution
It introduces three innovative directions for similarity search that go beyond data quality, emphasizing mechanisms like implicit structures, user interaction, and multi-query strategies.
Findings
Preliminary insights into new search pathways.
Identification of challenges in large-scale applications.
Potential for improved retrieval effectiveness.
Abstract
In this vision paper, we propose a shift in perspective for improving the effectiveness of similarity search. Rather than focusing solely on enhancing the data quality, particularly machine learning-generated embeddings, we advocate for a more comprehensive approach that also enhances the underpinning search mechanisms. We highlight three novel avenues that call for a redefinition of the similarity search problem: exploiting implicit data structures and distributions, engaging users in an iterative feedback loop, and moving beyond a single query vector. These novel pathways have gained relevance in emerging applications such as large-scale language models, video clip retrieval, and data labeling. We discuss the corresponding research challenges posed by these new problem areas and share insights from our preliminary discoveries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Multimodal Machine Learning Applications
MethodsContrastive Language-Image Pre-training
