CodeRepoQA: A Large-scale Benchmark for Software Engineering Question   Answering

Ruida Hu; Chao Peng; Jingyi Ren; Bo Jiang; Xiangxin Meng; Qinyun Wu,; Pengfei Gao; Xinchen Wang; Cuiyun Gao

arXiv:2412.14764·cs.SE·December 20, 2024

CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering

Ruida Hu, Chao Peng, Jingyi Ren, Bo Jiang, Xiangxin Meng, Qinyun Wu,, Pengfei Gao, Xinchen Wang, Cuiyun Gao

PDF

Open Access

TL;DR

CodeRepoQA is a comprehensive, large-scale benchmark dataset for evaluating software engineering question-answering capabilities of language models across multiple programming languages and scenarios.

Contribution

Introduces a novel, large-scale, multi-language benchmark dataset for repository-level software engineering question answering, with extensive data collection and analysis.

Findings

01

LLMs have limitations in software engineering QA tasks.

02

Medium-length contexts improve LLM performance.

03

Benchmark is publicly available for research use.

Abstract

In this work, we introduce CodeRepoQA, a large-scale benchmark specifically designed for evaluating repository-level question-answering capabilities in the field of software engineering. CodeRepoQA encompasses five programming languages and covers a wide range of scenarios, enabling comprehensive evaluation of language models. To construct this dataset, we crawl data from 30 well-known repositories in GitHub, the largest platform for hosting and collaborating on code, and carefully filter raw data. In total, CodeRepoQA is a multi-turn question-answering benchmark with 585,687 entries, covering a diverse array of software engineering scenarios, with an average of 6.62 dialogue turns per entry. We evaluate ten popular large language models on our dataset and provide in-depth analysis. We find that LLMs still have limitations in question-answering capabilities in the field of software…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Reliability and Analysis Research