CoSQA+: Pioneering the Multi-Choice Code Search Benchmark with Test-Driven Agents
Jing Gong, Yanghui Wu, Linxi Liang, Yanlin Wang, Jiachi Chen, Mingwei Liu, Zibin Zheng

TL;DR
This paper introduces CoSQA+, a multi-choice code search benchmark with test-driven agents that improve annotation accuracy and scalability, enabling better training and evaluation of code search models.
Contribution
It presents a novel automated pipeline with test-driven verification for high-quality code search data annotation, surpassing previous datasets in accuracy and scalability.
Findings
Test-driven agents achieve 93.9% accuracy in annotations.
Models trained on CoSQA+ outperform those trained on previous datasets.
CoSQA+ contains over 412,000 agent-annotated pairs and 1,000 human-verified pairs.
Abstract
Semantic code search, retrieving code that matches a given natural language query, is an important task to improve productivity in software engineering. Existing code search datasets face limitations: they rely on human annotators who assess code primarily through semantic understanding rather than functional verification, leading to potential inaccuracies and scalability issues. Additionally, current evaluation metrics often overlook the multi-choice nature of code search. This paper introduces CoSQA+, pairing high-quality queries from CoSQA with multiple suitable codes. We develop an automated pipeline featuring multiple model-based candidate selections and the novel test-driven agent annotation system. Among a single Large Language Model (LLM) annotator and Python expert annotators (without test-based verification), agents leverage test-based verification and achieve the highest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Web Data Mining and Analysis · Topic Modeling
