CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
Xiangyang Li, Kuicai Dong, Yi Quan Lee, Wei Xia, Hao Zhang, Xinyi Dai, Yasheng Wang, Ruiming Tang

TL;DR
COIR introduces a comprehensive benchmark with diverse datasets and tasks to evaluate and advance code information retrieval models, highlighting current challenges and facilitating research progress.
Contribution
The paper presents COIR, a new extensive benchmark for code retrieval, including datasets, evaluation framework, and analysis of existing models' performance.
Findings
State-of-the-art models face significant challenges in code retrieval tasks.
COIR's diverse datasets reveal gaps in current model capabilities.
The benchmark facilitates cross-domain and cross-task evaluation of code retrieval systems.
Abstract
Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this gap, we present COIR (Code Information Retrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. COIR comprises ten meticulously curated code datasets, spanning eight distinctive retrieval tasks across seven diverse domains. We first discuss the construction of COIR and its diverse dataset composition. Further, we evaluate nine widely used retrieval models using COIR, uncovering significant difficulties in performing code retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Computational Techniques and Applications
