Utilizing Source Code Syntax Patterns to Detect Bug Inducing Commits using Machine Learning Models
Md Nadim, Banani Roy

TL;DR
This paper introduces a novel method for extracting source code syntax pattern features to improve bug-inducing commit detection using machine learning, demonstrating enhanced performance across multiple datasets and models.
Contribution
It proposes a new feature extraction approach based on source code syntax patterns, improving bug prediction accuracy over traditional features.
Findings
Proposed features improve bug detection performance across five ML models.
Features enhance detection accuracy when combined with Deep Belief Network.
Our features offer better explainability of bug predictions.
Abstract
Detecting Bug Inducing Commit (BIC) or Just in Time (JIT) defect prediction using Machine Learning (ML) based models requires tabulated feature values extracted from the source code or historical maintenance data of a software system. Existing studies have utilized meta-data from source code repositories (we named them GitHub Statistics or GS), n-gram-based source code text processing, and developer's information (e.g., the experience of a developer) as the feature values in ML-based bug detection models. However, these feature values do not represent the source code syntax styles or patterns that a developer might prefer over available valid alternatives provided by programming languages. This investigation proposed a method to extract features from its source code syntax patterns to represent software commits and investigate whether they are helpful in detecting bug proneness in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
