Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun, Sanjay Krishna Gouda, Zijian Wang, Xiaopeng Li,, Yuchen Tian, Ming Tan, Wasi Uddin Ahmad, Shiqi Wang, Qing Sun, Mingyue Shang,, Sujan Kumar Gonugondla, Hantian Ding, Varun Kumar, Nathan Fulton, Arash, Farahani, Siddhartha Jain, Robert Giaquinto, Haifeng Qian

TL;DR
This paper introduces new multilingual benchmarks for code generation models, enabling evaluation across multiple programming languages and demonstrating models' generalization, translation, and few-shot learning capabilities.
Contribution
The authors present MBXP, Multilingual HumanEval, and MathQA-X datasets, along with a scalable conversion framework for multilingual code evaluation, advancing the assessment of language models' coding abilities.
Findings
Multilingual models outperform monolingual models.
Few-shot prompting enables learning new languages.
Models exhibit zero-shot translation capabilities.
Abstract
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages and are generated using a scalable conversion framework that transpiles prompts and test cases from the original Python datasets into the corresponding data in the target language. Using these benchmarks, we are able to assess the performance of code generation models in a multi-lingual fashion, and discovered generalization ability of language models on out-of-domain languages, advantages of multi-lingual models over mono-lingual, the ability of few-shot prompting to teach the model new languages, and zero-shot translation abilities even on mono-lingual settings. Furthermore, we use our code generation model to perform large-scale bootstrapping to obtain synthetic canonical solutions in several languages, which can be used for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Machine Learning and Data Classification
MethodsTest
