How Much Data Analytics is Enough? The ROI of Machine Learning Classification and its Application to Requirements Dependency Classification
Gouri Deshpande, Guenther Ruhe, Chad Saunders

TL;DR
This paper emphasizes the importance of considering ROI alongside accuracy when selecting machine learning techniques for requirements dependency classification, demonstrating that ROI analysis can significantly alter decision-making.
Contribution
It introduces an approach that integrates ROI considerations into ML technique selection, extending beyond traditional accuracy-based criteria in software engineering applications.
Findings
ROI considerations can drastically change ML technique choices.
Random Forest and BERT performance vary significantly when evaluated with ROI.
Recommendations for ML selection based on training data size and ROI are provided.
Abstract
Machine Learning (ML) can substantially improve the efficiency and effectiveness of organizations and is widely used for different purposes within Software Engineering. However, the selection and implementation of ML techniques rely almost exclusively on accuracy criteria. Thus, for organizations wishing to realize the benefits of ML investments, this narrow approach ignores crucial considerations around the anticipated costs of the ML activities across the ML lifecycle, while failing to account for the benefits that are likely to accrue from the proposed activity. We present findings for an approach that addresses this gap by enhancing the accuracy criterion with return on investment (ROI) considerations. Specifically, we analyze the performance of the two state-of-the-art ML techniques: Random Forest and Bidirectional Encoder Representations from Transformers (BERT), based on accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Data Quality and Management
