LitSearch: A Retrieval Benchmark for Scientific Literature Search
Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi, Chen, Tianyu Gao

TL;DR
LitSearch is a new benchmark dataset with 597 complex literature search queries designed to evaluate and improve retrieval systems for scientific literature, revealing significant gaps in current models and tools.
Contribution
The paper introduces LitSearch, a high-quality, manually curated benchmark for scientific literature retrieval, and provides extensive evaluation of current retrieval models and reranking strategies.
Findings
Significant performance gap between BM25 and dense retrievers (24.8% recall@5 difference)
LLM-based reranking improves dense retriever performance by 4.4%
Commercial search engines perform poorly on LitSearch, lagging behind specialized models by up to 32 recall points
Abstract
Literature search questions, such as "Where can I find research on the evaluation of consistency in generated summaries?" pose significant challenges for modern search engines and retrieval systems. These questions often require a deep understanding of research concepts and the ability to reason across entire articles. In this work, we introduce LitSearch, a retrieval benchmark comprising 597 realistic literature search queries about recent ML and NLP papers. LitSearch is constructed using a combination of (1) questions generated by GPT-4 based on paragraphs containing inline citations from research papers and (2) questions manually written by authors about their recently published papers. All LitSearch questions were manually examined or edited by experts to ensure high quality. We extensively benchmark state-of-the-art retrieval models and also evaluate two LLM-based reranking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections
