Methods for Stabilizing Models across Large Samples of Projects (with case studies on Predicting Defect and Project Health)
Suvodeep Majumder, Tianpei Xia, Rahul Krishna, Tim Menzies

TL;DR
This paper introduces STABILIZER, a transfer learning framework that generates stable, high-performing models across large samples of software projects, demonstrating improved speed and generalizability in predicting defect and project health.
Contribution
The paper presents STABILIZER, a novel transfer learning method that efficiently produces stable models applicable to hundreds of projects, outperforming existing approaches in speed and stability.
Findings
STABILIZER produces minimal models for defect prediction and project health.
Models generated by STABILIZER perform as well or better than prior state-of-the-art.
STABILIZER is significantly faster than previous transfer learning methods.
Abstract
Despite decades of research, SE lacks widely accepted models (that offer precise quantitative stable predictions) about what factors most influence software quality. This paper provides a promising result showing such stable models can be generated using a new transfer learning framework called "STABILIZER". Given a tree of recursively clustered projects (using project meta-data), STABILIZER promotes a model upwards if it performs best in the lower clusters (stopping when the promoted model performs worse than the models seen at a lower level). The number of models found by STABILIZER is minimal: one for defect prediction (756 projects) and less than a dozen for project health (1628 projects). Hence, via STABILIZER, it is possible to find a few projects which can be used for transfer learning and make conclusions that hold across hundreds of projects at a time. Further, the models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability
