SDPERL: A Framework for Software Defect Prediction Using Ensemble Feature Extraction and Reinforcement Learning
Mohsen Hesamolhokama, Amirahmad Shafiee, Mohammadreza Ahmaditeshnizi,, Mohammadamin Fazli, Jafar Habibi

TL;DR
This paper introduces SDPERL, a novel framework combining ensemble feature extraction and reinforcement learning for file-level software defect prediction, demonstrating improved accuracy over traditional methods.
Contribution
The work is among the first to integrate ensemble feature extraction with reinforcement learning for defect prediction at the file level.
Findings
Achieved an average 6.25% improvement in F1-Score over baseline models.
Effectively identified the most predictive features using RL and ACO-inspired mechanisms.
Demonstrated superior performance on the PROMISE dataset.
Abstract
Ensuring software quality remains a critical challenge in complex and dynamic development environments, where software defects can result in significant operational and financial risks. This paper proposes an innovative framework for software defect prediction that combines ensemble feature extraction with reinforcement learning (RL)--based feature selection. We claim that this work is among the first in recent efforts to address this challenge at the file-level granularity. The framework extracts diverse semantic and structural features from source code using five code-specific pre-trained models. Feature selection is enhanced through a custom-defined embedding space tailored to represent feature interactions, coupled with a pheromone table mechanism inspired by Ant Colony Optimization (ACO) to guide the RL agent effectively. Using the Proximal Policy Optimization (PPO) algorithm, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Imbalanced Data Classification Techniques · Software Reliability and Analysis Research
