Long Code Arena: a Set of Benchmarks for Long-Context Code Models
Egor Bogomolov, Aleksandra Eliseeva, Timur Galimzyanov, Evgeniy, Glukhov, Anton Shapkin, Maria Tigina, Yaroslav Golubev, Alexander Kovrigin,, Arie van Deursen, Maliheh Izadi, Timofey Bryksin

TL;DR
Long Code Arena introduces six project-wide code processing benchmarks to evaluate models on tasks requiring long-context understanding, addressing a significant gap in current code processing benchmarks.
Contribution
This work provides a comprehensive suite of benchmarks, datasets, evaluation tools, and baseline solutions for long-context code processing tasks, facilitating research in this area.
Findings
Benchmark datasets for six code tasks are publicly available.
Open-source baselines demonstrate practical usage.
The benchmarks enable evaluation of models on project-wide context understanding.
Abstract
Nowadays, the fields of code and natural language processing are evolving rapidly. In particular, models become better at processing long context windows - supported context sizes have increased by orders of magnitude over the last few years. However, there is a shortage of benchmarks for code processing that go beyond a single file of context, while the most popular ones are limited to a single method. With this work, we aim to close this gap by introducing Long Code Arena, a suite of six benchmarks for code processing tasks that require project-wide context. These tasks cover different aspects of code processing: library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization. For each task, we provide a manually verified dataset for testing, an evaluation suite, and open-source baseline solutions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- JetBrains-Research/lca-bug-localizationdataset· 759 dl759 dl
- JetBrains-Research/lca-commit-message-generationdataset· 112 dl112 dl
- JetBrains-Research/lca-ci-builds-repairdataset· 57 dl57 dl
- JetBrains-Research/lca-module-summarizationdataset· 319 dl319 dl
- JetBrains-Research/lca-project-level-code-completiondataset· 820 dl820 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Model-Driven Software Engineering Techniques · Power Systems and Technologies
