An empirical evaluation of the usefulness of Tree Kernels for Commit-time Defect Detection in large software systems
Hareem Sahar, Yuxin Liu, Abram Hindle, Denilson Barbosa

TL;DR
This paper evaluates the effectiveness of tree kernel methods at commit time for detecting bugs in large software systems, focusing on method-level analysis and comparing with existing clone detection tools.
Contribution
It introduces a tree kernel-based approach for commit-time defect detection that considers method-level changes and compares its performance with NiCad on benchmark datasets.
Findings
Tree kernels achieve comparable clone detection performance to NiCad.
The approach effectively detects defect-inducing commits with high accuracy.
Method-level analysis provides targeted insights for developers.
Abstract
Defect detection at commit check-in time prevents the introduction of defects into software systems. Current defect detection approaches rely on metric-based models which are not very accurate and whose results are not directly useful for developers. We propose a method to detect bug-inducing commits by comparing the incoming changes with all past commits in the project, considering both those that introduced defects and those that did not. Our method considers individual changes in the commit separately, at the method-level granularity. Doing so helps developers as they are informed of specific methods that need further attention instead of being told that the entire commit is problematic. Our approach represents source code as abstract syntax trees and uses tree kernels to estimate the similarity of the code with previous commits. We experiment with subtree kernels (STK), subset tree…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software System Performance and Reliability · Software Testing and Debugging Techniques
