Neural Code Search Evaluation Dataset

Hongyu Li; Seohyun Kim; Satish Chandra

arXiv:1908.09804·cs.SE·October 3, 2019·24 cites

Neural Code Search Evaluation Dataset

Hongyu Li, Seohyun Kim, Satish Chandra

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces a new evaluation dataset for neural code search models, enabling standardized benchmarking and comparison of different approaches in natural language to code retrieval tasks.

Contribution

It provides a publicly available dataset of query-code pairs and baseline results, facilitating consistent evaluation in neural code search research.

Findings

01

Baseline models show varying performance on the dataset.

02

The dataset enables fair comparison of code search methods.

03

Future research can leverage this benchmark for improvements.

Abstract

There has been an increase of interest in code search using natural language. Assessing the performance of such code search models can be difficult without a readily available evaluation suite. In this paper, we present an evaluation dataset consisting of natural language query and code snippet pairs, with the hope that future work in this area can use this dataset as a common benchmark. We also provide the results of two code search models ([1] and [6]) from recent work. The evaluation dataset is available at https://github.com/facebookresearch/Neural-Code-Search-Evaluation-Dataset

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

facebook/neural_code_search
dataset· 318 dl
318 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Software Engineering Research · Topic Modeling