On the Anatomy of Real-World R Code for Static Analysis
Florian Sihler, Lukas Pietzschmann, Raphael Straub, Matthias Tichy,, Andor Diera, Abdelhalim Dahou

TL;DR
This study conducts a large-scale static analysis of over 50 million lines of real-world R code to understand usage patterns, informing static analysis tools and interpreter optimizations.
Contribution
It provides the first comprehensive analysis of real-world R code usage, highlighting features most relevant for static analysis and optimization.
Findings
High frequency of name-based indexing, assignments, and loops.
Low usage of reflective functions and foreign function interface.
Differences between user scripts and package sources in size and usage patterns.
Abstract
CONTEXT The R programming language has a huge and active community, especially in the area of statistical computing. Its interpreted nature allows for several interesting constructs, like the manipulation of functions at run-time, that hinder the static analysis of R programs. At the same time, there is a lack of existing research regarding how these features, or even the R language as a whole are used in practice. OBJECTIVE In this paper, we conduct a large-scale, static analysis of more than 50 million lines of real-world R programs and packages to identify their characteristics and the features that are actually used. Moreover, we compare the similarities and differences between the scripts of R users and the implementations of package authors. We provide insights for static analysis tools like the lintr package as well as potential interpreter optimizations and uncover areas for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Data Analysis with R · Statistical Methods and Bayesian Inference
