Combining Spreadsheet Smells for Improved Fault Prediction
Patrick Koch, Konstantin Schekotihin, Dietmar Jannach, Birgit Hofer,, Franz Wotawa

TL;DR
This paper introduces a machine learning approach that combines multiple spreadsheet smell indicators using AdaBoost to improve fault prediction accuracy in spreadsheets, addressing the limitations of individual smell-based predictions.
Contribution
The study proposes a novel ensemble method that integrates various spreadsheet smells for enhanced fault prediction, demonstrating significant accuracy improvements.
Findings
Improved fault prediction accuracy with ensemble approach
Combining multiple smells outperforms individual smell predictions
Effective on real-world spreadsheet fault datasets
Abstract
Spreadsheets are commonly used in organizations as a programming tool for business-related calculations and decision making. Since faults in spreadsheets can have severe business impacts, a number of approaches from general software engineering have been applied to spreadsheets in recent years, among them the concept of code smells. Smells can in particular be used for the task of fault prediction. An analysis of existing spreadsheet smells, however, revealed that the predictive power of individual smells can be limited. In this work we therefore propose a machine learning based approach which combines the predictions of individual smells by using an AdaBoost ensemble classifier. Experiments on two public datasets containing real-world spreadsheet faults show significant improvements in terms of fault prediction accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
