The Java Build Framework: Large Scale Compilation
Pedro Martins, Rohan Achar, Cristina V. Lopes

TL;DR
The paper introduces the Java Build Framework, a tool that automatically compiles large-scale Java repositories, enabling research on large, real-world codebases by ensuring projects are buildable and runnable.
Contribution
It presents a novel method combining a large JAR repository and fault resolution techniques to automatically compile extensive Java projects from open source repositories.
Findings
Successfully compiled a large percentage of Java projects from GitHub.
Enabled large-scale research on Java codebases with guaranteed compilability.
Improved the reproducibility of Java project analyses.
Abstract
Large repositories of source code for research tend to limit their utility to static analysis of the code, as they give no guarantees on whether the projects are compilable, much less runnable in any way. The immediate consequence of the lack of large compilable and runnable datasets is that research that requires such properties does not generalize beyond small benchmarks. We present the Java Build Framework, a method and tool capable of automatically compiling a large percentage of Java projects available in open source repositories like GitHub. Two elements are at the core: a very large repository of JAR files, and techniques of resolution of compilation faults and dependencies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Model-Driven Software Engineering Techniques · Scientific Computing and Data Management
