Measuring The Impact Of Programming Language Distribution

Gabriel Orlanski; Kefan Xiao; Xavier Garcia; Jeffrey Hui; Joshua; Howland; Jonathan Malmaud; Jacob Austin; Rishabh Singh; Michele Catasta

arXiv:2302.01973·cs.LG·May 25, 2023

Measuring The Impact Of Programming Language Distribution

Gabriel Orlanski, Kefan Xiao, Xavier Garcia, Jeffrey Hui, Joshua, Howland, Jonathan Malmaud, Jacob Austin, Rishabh Singh, Michele Catasta

PDF

Open Access 1 Repo 5 Datasets

TL;DR

This paper introduces BabelCode, a framework for execution-based evaluation of neural code models across many languages, and demonstrates that balancing language distributions in training data improves performance on low-resource languages.

Contribution

It presents BabelCode for language-agnostic evaluation and a new translation dataset, TP3, to study the impact of language distribution balancing on model performance.

Findings

01

Balanced training data improves low-resource language performance by 12.34% on average.

02

Balancing increases low-resource language pass@k by 30.77%.

03

Slight decrease in high-resource language performance by 12.94%..

Abstract

Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/babelcode
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Data Classification · Advanced Neural Network Applications

MethodsTest