Machine Learning Pipeline for Software Engineering: A Systematic Literature Review
Samah Kansab

TL;DR
This systematic review analyzes current machine learning pipelines in software engineering, highlighting best practices, common challenges, and future research directions to improve software quality and development efficiency.
Contribution
It consolidates state-of-the-art ML pipeline components for SE, identifies effective techniques, and discusses gaps and challenges to guide future research and practice.
Findings
Robust preprocessing techniques like SMOTE and SZZ improve model reliability.
Ensemble methods such as Random Forest and Gradient Boosting outperform others.
Evaluation metrics like AUC, F1-score, and BAM are used to assess models.
Abstract
The rapid advancement of software development practices has introduced challenges in ensuring quality and efficiency across the software engineering (SE) lifecycle. As SE systems grow in complexity, traditional approaches often fail to scale, resulting in longer debugging times, inefficient defect detection, and resource-heavy development cycles. Machine Learning (ML) has emerged as a key solution, enabling automation in tasks such as defect prediction, code review, and release quality estimation. However, the effectiveness of ML in SE depends on the robustness of its pipeline, including data collection, preprocessing, feature engineering, algorithm selection, validation, and evaluation. This systematic literature review (SLR) examines state-of-the-art ML pipelines designed for SE, consolidating best practices, challenges, and gaps. Our findings show that robust preprocessing, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Engineering Techniques and Practices
