Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval
Jonghyun Song, Cheyon Jin, Wenlong Zhao, Andrew McCallum, Jay-Yoon Lee

TL;DR
The paper introduces CMC, a scalable framework that compares multiple candidates simultaneously using self-attention, improving retrieval recall and accuracy while maintaining efficiency in retrieval and reranking tasks.
Contribution
It proposes the CMC framework that enables efficient, simultaneous comparison of many candidates, enhancing retrieval and ranking performance over traditional bi-encoder and cross-encoder methods.
Findings
CMC improves recall@k by up to 4.8 percentage points.
CMC is 11 times faster than cross-encoders.
CMC enhances top-1 accuracy in downstream tasks like entity linking and dialogue ranking.
Abstract
A common retrieve-and-rerank paradigm involves retrieving relevant candidates from a broad set using a fast bi-encoder (BE), followed by applying expensive but accurate cross-encoders (CE) to a limited candidate set. However, relying on this small subset is often susceptible to error propagation from the bi-encoders, which limits the overall performance. To address these issues, we propose the Comparing Multiple Candidates (CMC) framework. CMC compares a query and multiple embeddings of similar candidates (i.e., neighbors) through shallow self-attention layers, delivering rich representations contextualized to each other. Furthermore, CMC is scalable enough to handle multiple comparisons simultaneously. For example, comparing ~10K candidates with CMC takes a similar amount of time as comparing 16 candidates with CE. Experimental results on the ZeSHEL dataset demonstrate that CMC, when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsInformation Retrieval and Search Behavior
MethodsSparse Evolutionary Training
