Beyond Accuracy: ROI-driven Data Analytics of Empirical Data
Gouri Deshpande, Guenther Ruhe

TL;DR
This paper emphasizes the importance of ROI in guiding data analytics decisions, proposing a framework validated through case studies on dependency extraction in Mozilla Firefox, demonstrating how ROI can optimize analysis efforts.
Contribution
It introduces a conceptual framework for ROI-driven data analytics and validates it with empirical studies, guiding when to stop analysis based on cost-benefit considerations.
Findings
Fine-tuned BERT outperforms Random Forest with over 40% training data.
Active Learning achieves higher accuracy with fewer iterations.
ROI analysis helps determine the optimal stopping point for data analysis.
Abstract
This vision paper demonstrates that it is crucial to consider Return-on-Investment (ROI) when performing Data Analytics. Decisions on "How much analytics is needed"? are hard to answer. ROI could guide for decision support on the What?, How?, and How Much? analytics for a given problem. Method: The proposed conceptual framework is validated through two empirical studies that focus on requirements dependencies extraction in the Mozilla Firefox project. The two case studies are (i) Evaluation of fine-tuned BERT against Naive Bayes and Random Forest machine learners for binary dependency classification and (ii) Active Learning against passive Learning (random sampling) for REQUIRES dependency extraction. For both the cases, their analysis investment (cost) is estimated, and the achievable benefit from DA is predicted, to determine a break-even point of the investigation. Results: For the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Algorithms and Data Compression
MethodsLinear Layer · Softmax · Layer Normalization · Weight Decay · Dropout · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · WordPiece · Multi-Head Attention
