An Exploratory Study on Just-in-Time Multi-Programming-Language Bug Prediction
Zengyang Li, Jiabao Ji, Peng Liang, Ran Mo, Hui Liu

TL;DR
This study develops and evaluates machine learning models for predicting multi-programming-language bugs in software systems, demonstrating the effectiveness of certain metrics and the feasibility of cross-project prediction.
Contribution
It introduces the first JIT MPLB prediction models using machine learning, identifies key metrics, and shows that cross-project models outperform within-project models.
Findings
Random Forest is most suitable for prediction.
Key metrics include changed LOC, added LOC, and total lines.
Cross-project training improves prediction performance.
Abstract
Context: An increasing number of software systems are written in multiple programming languages (PLs), which are called multi-programming-language (MPL) systems. MPL bugs (MPLBs) refers to the bugs whose resolution involves multiple PLs. Despite high complexity of MPLB resolution, there lacks MPLB prediction methods. Objective: This work aims to construct just-in-time (JIT) MPLB prediction models with selected prediction metrics, analyze the significance of the metrics, and then evaluate the performance of cross-project JIT MPLB prediction. Method: We develop JIT MPLB prediction models with the selected metrics using machine learning algorithms and evaluate the models in within-project and cross-project contexts with our constructed dataset based on 18 Apache MPL projects. Results: Random Forest is appropriate for JIT MPLB prediction. Changed LOC of all files, added LOC of all files,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Engineering Techniques and Practices · Software System Performance and Reliability
