CodeRepoQA: A Large-scale Benchmark for Software Engineering Question Answering
Ruida Hu, Chao Peng, Jingyi Ren, Bo Jiang, Xiangxin Meng, Qinyun Wu,, Pengfei Gao, Xinchen Wang, Cuiyun Gao

TL;DR
CodeRepoQA is a comprehensive, large-scale benchmark dataset for evaluating software engineering question-answering capabilities of language models across multiple programming languages and scenarios.
Contribution
Introduces a novel, large-scale, multi-language benchmark dataset for repository-level software engineering question answering, with extensive data collection and analysis.
Findings
LLMs have limitations in software engineering QA tasks.
Medium-length contexts improve LLM performance.
Benchmark is publicly available for research use.
Abstract
In this work, we introduce CodeRepoQA, a large-scale benchmark specifically designed for evaluating repository-level question-answering capabilities in the field of software engineering. CodeRepoQA encompasses five programming languages and covers a wide range of scenarios, enabling comprehensive evaluation of language models. To construct this dataset, we crawl data from 30 well-known repositories in GitHub, the largest platform for hosting and collaborating on code, and carefully filter raw data. In total, CodeRepoQA is a multi-turn question-answering benchmark with 585,687 entries, covering a diverse array of software engineering scenarios, with an average of 6.62 dialogue turns per entry. We evaluate ten popular large language models on our dataset and provide in-depth analysis. We find that LLMs still have limitations in question-answering capabilities in the field of software…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software Reliability and Analysis Research
