The impact of feature importance methods on the interpretation of defect classifiers
Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei, Ahmed E., Hassan

TL;DR
This study evaluates how different feature importance methods affect the interpretation of defect classifiers, revealing inconsistencies and the influence of feature interactions on importance rankings across multiple software projects.
Contribution
The paper provides an empirical comparison of classifier specific and classifier agnostic feature importance methods, highlighting their agreement levels and the impact of feature interactions.
Findings
CA methods show strong agreement in feature importance rankings.
CS methods often produce vastly different importance rankings.
Removing feature interactions improves agreement between CA and CS methods.
Abstract
Classifier specific (CS) and classifier agnostic (CA) feature importance methods are widely used (often interchangeably) by prior studies to derive feature importance ranks from a defect classifier. However, different feature importance methods are likely to compute different feature importance ranks even for the same dataset and classifier. Hence such interchangeable use of feature importance methods can lead to conclusion instabilities unless there is a strong agreement among different methods. Therefore, in this paper, we evaluate the agreement between the feature importance ranks associated with the studied classifiers through a case study of 18 software projects and six commonly used classifiers. We find that: 1) The computed feature importance ranks by CA and CS methods do not always strongly agree with each other. 2) The computed feature importance ranks by the studied CA methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
