Measuring The Impact Of Programming Language Distribution
Gabriel Orlanski, Kefan Xiao, Xavier Garcia, Jeffrey Hui, Joshua, Howland, Jonathan Malmaud, Jacob Austin, Rishabh Singh, Michele Catasta

TL;DR
This paper introduces BabelCode, a framework for execution-based evaluation of neural code models across many languages, and demonstrates that balancing language distributions in training data improves performance on low-resource languages.
Contribution
It presents BabelCode for language-agnostic evaluation and a new translation dataset, TP3, to study the impact of language distribution balancing on model performance.
Findings
Balanced training data improves low-resource language performance by 12.34% on average.
Balancing increases low-resource language pass@k by 30.77%.
Slight decrease in high-resource language performance by 12.94%..
Abstract
Current benchmarks for evaluating neural code models focus on only a small subset of programming languages, excluding many popular languages such as Go or Rust. To ameliorate this issue, we present the BabelCode framework for execution-based evaluation of any benchmark in any language. BabelCode enables new investigations into the qualitative performance of models' memory, runtime, and individual test case results. Additionally, we present a new code translation dataset called Translating Python Programming Puzzles (TP3) from the Python Programming Puzzles (Schuster et al. 2021) benchmark that involves translating expert-level python functions to any language. With both BabelCode and the TP3 benchmark, we investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages. Training a model on a balanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Machine Learning and Data Classification · Advanced Neural Network Applications
MethodsTest
