Do Current Language Models Support Code Intelligence for R Programming Language?
ZiXiao Zhao, Fatemeh H. Fard

TL;DR
This paper investigates the effectiveness of pre-trained language models for code in understanding R programming language, revealing performance challenges and the impact of R's unique syntax and project context.
Contribution
First evaluation of Code-PLMs on R language tasks, including dataset collection, analysis of R-specific syntax effects, and insights into model limitations.
Findings
Models show performance degradation on R code tasks.
R's dual syntax paradigms affect model accuracy.
Project context influences cross-project training performance.
Abstract
Recent advancements in developing Pre-trained Language Models for Code (Code-PLMs) have urged many areas of Software Engineering (SE) and brought breakthrough results for many SE tasks. Though these models have achieved the state-of-the-art performance for SE tasks for many popular programming languages, such as Java and Python, the Scientific Software and its related languages like R programming language have rarely benefited or even been evaluated with the Code-PLMs. Research has shown that R has many differences with other programming languages and requires specific techniques. In this study, we provide the first insights for code intelligence for R. For this purpose, we collect and open source an R dataset, and evaluate Code-PLMs for the two tasks of code summarization and method name prediction using several settings and strategies, including the differences in two R styles,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Analysis with R
MethodsBalanced Selection
