RepoQA: Evaluating Long Context Code Understanding

Jiawei Liu; Jia Le Tian; Vijay Daita; Yuxiang Wei; Yifeng Ding; Yuhan; Katherine Wang; Jun Yang; Lingming Zhang

arXiv:2406.06025·cs.SE·June 11, 2024

RepoQA: Evaluating Long Context Code Understanding

Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding, Yuhan, Katherine Wang, Jun Yang, Lingming Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces RepoQA, a benchmark for evaluating large language models' ability to understand long-context code across multiple languages, revealing insights into model performance and understanding strategies.

Contribution

The paper presents RepoQA, a novel multilingual benchmark focusing on long-context code understanding, with a new task called Searching Needle Function (SNF).

Findings

01

Models perform variably across languages.

02

Proprietary models slightly outperform open models.

03

Models may understand code better without comments.

Abstract

Recent advances have been improving the context windows of Large Language Models (LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been developed to test LLMs over a large chunk of raw texts. While effective, current evaluations overlook the insight of how LLMs work with long-context code, i.e., repositories. To this end, we initiate the RepoQA benchmark to evaluate LLMs on long-context code understanding. Traditional needle testers ask LLMs to directly retrieve the answer from the context without necessary deep understanding. In RepoQA, we built our initial task, namely Searching Needle Function (SNF), which exercises LLMs to search functions given their natural-language description, i.e., LLMs cannot find the desired function if they cannot understand the description and code. RepoQA is multilingual and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evalplus/repoqa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Semantic Web and Ontologies · Natural Language Processing Techniques