Long Code Arena: a Set of Benchmarks for Long-Context Code Models

Egor Bogomolov; Aleksandra Eliseeva; Timur Galimzyanov; Evgeniy; Glukhov; Anton Shapkin; Maria Tigina; Yaroslav Golubev; Alexander Kovrigin,; Arie van Deursen; Maliheh Izadi; Timofey Bryksin

arXiv:2406.11612·cs.LG·June 18, 2024·5 cites

Long Code Arena: a Set of Benchmarks for Long-Context Code Models

Egor Bogomolov, Aleksandra Eliseeva, Timur Galimzyanov, Evgeniy, Glukhov, Anton Shapkin, Maria Tigina, Yaroslav Golubev, Alexander Kovrigin,, Arie van Deursen, Maliheh Izadi, Timofey Bryksin

PDF

Open Access 1 Repo 2 Models 5 Datasets

TL;DR

Long Code Arena introduces six project-wide code processing benchmarks to evaluate models on tasks requiring long-context understanding, addressing a significant gap in current code processing benchmarks.

Contribution

This work provides a comprehensive suite of benchmarks, datasets, evaluation tools, and baseline solutions for long-context code processing tasks, facilitating research in this area.

Findings

01

Benchmark datasets for six code tasks are publicly available.

02

Open-source baselines demonstrate practical usage.

03

The benchmarks enable evaluation of models on project-wide context understanding.

Abstract

Nowadays, the fields of code and natural language processing are evolving rapidly. In particular, models become better at processing long context windows - supported context sizes have increased by orders of magnitude over the last few years. However, there is a shortage of benchmarks for code processing that go beyond a single file of context, while the most popular ones are limited to a single method. With this work, we aim to close this gap by introducing Long Code Arena, a suite of six benchmarks for code processing tasks that require project-wide context. These tasks cover different aspects of code processing: library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization. For each task, we provide a manually verified dataset for testing, an evaluation suite, and open-source baseline solutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jetbrains-research/lca-baselines
noneOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Model-Driven Software Engineering Techniques · Power Systems and Technologies