On the diversity and frequency of code related to mathematical formulas in real-world Java projects
Oliver Moseler, Felix Lemmer, Sebastian Baltes, Stephan Diehl

TL;DR
This study analyzes the prevalence and characteristics of mathematical formula code in Java projects, developing detection methods and providing insights into its role and documentation in real-world software.
Contribution
It introduces syntactical patterns for detecting formula code, estimates their frequency in Java projects, and assesses the usefulness of comments for understanding such code.
Findings
Approximately 1 in 700 lines of code in open-source Java projects implements a formula.
In scientific computing projects, about 1 in 100 lines of code is formula-related.
Comments are helpful for understanding formula code, as shown by an online survey.
Abstract
In this paper, the term formula code refers to fragments of source code that implement a mathematical formula. We present empirical studies that analyze the diversity and frequency of formula code in open-source-software projects. In an exploratory study, we investigated what kinds of formulas are implemented in real-world Java projects and derived syntactical patterns and constraints. We refined these patterns for sum and product formulas to automatically detect formula code in software archives and to reconstruct the implemented formula in mathematical notation. In a quantitative study of a large sample of engineered Java projects on GitHub we analyzed the frequency of formula code and estimated that one of 700 lines of code in this sample implements a sum or product formula. For a sample of scientific-computing projects, we found that one of 100 lines of code implements a sum or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
